An HDFS read operation from a client involves the following:
- The client requests NameNode to determine where the actual data blocks are stored for a given file.
- NameNode obliges by providing the block IDs and locations of the hosts (DataNode) where the data can be found.
- The client contacts DataNode with the respective block IDs to fetch the data from DataNode while preserving the order of the block files.
An HDFS write operation from a client involves the following:
- The client contacts NameNode to update the namespace with the filename and verify the necessary permissions.
- If the file exists, then NameNode throws an error; otherwise, it returns the client FSDataOutputStream which points to the data queue.
- The data queue negotiates with the NameNode to allocate new blocks on suitable DataNodes.
- The data is then copied to that DataNode, and, as per the replication strategy, the data is further copied from that DataNode to the rest of the DataNodes.
- It's important to note that the data is never moved through the NameNode as it would caused a performance bottleneck.