The fundamental premise of Hadoop

The fundamental premise of Hadoop is that instead of attempting to perform a task on a single large machine, the task can be subdivided into smaller segments that can then be delegated to multiple smaller machines. These so-called smaller machines would then perform the task on their own portion of the data. Once the smaller machines have completed their tasks to produce the results on the tasks they were allocated, the individual units of results would then be aggregated to produce the final result.

Although, in theory, this may appear relatively simple, there are various technical considerations to bear in mind. For example:

  • Is the network fast enough to collect the results from each individual server?
  • Can each individual server read data fast enough from the disk?
  • If one or more of the servers fail, do we have to start all over?
  • If there are multiple large tasks, how should they be prioritized?

There are many more such considerations that must be considered when working with a distributed architecture of this nature.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.234.24