Hadoop(HDFS) —Write/Read process

June 5, 2023

2540 views

3270 words

Last time I briefly introduced HDFS and already knew the definition of Block and the architecture of HDFS. In this blog, I will continue to explain the process of writing and reading data in HDFS and I tried to make some pics to simplify the description of the text.

The process of writing data in HDFS

Step 1: The client requests the NameNode to upload files through the Distributed FileSystem module, and the NameNode checks whether the target file already exists and whether the parent directory exists.

Step 2: NameNode returns whether it can be uploaded.

Step 3: The client requests to upload the first block.

Step 4: NameNode returns 3 DataNode nodes, namely DataNode 1, DataNode 2, and DataNode 3.

Step 5: The client requests DataNode 1 to upload data through the FSDataOutputStream module. When DataNode 1 receives the request, it will continue to call DataNode 2, and then DataNode 2 calls DataNode 3 to complete the communication pipeline.

Step 6: DataNode 1, DataNode 2, and DataNode 3 respond to client step by step.

Step 7: The client starts to upload the first block to DataNode 1 (first read data from the disk and put it into a local memory cache). DataNode 1 receives a Packet that will be passed to DataNode 2. DataNode 2 will give the data to DataNode 2, then to DataNode 3; DataNode 1 will put a reply queue to wait for a reply after a packet is sent.

Step 8: When the transmission of a block is completed, the client will request the NameNode again to upload the server of the second block. (then repeat steps 3-7 for the next Block).

The process of reading data in HDFS

Step 1: The client requests the NameNode to download the file through DistributedFileSystem, and the NameNode finds the DataNode address where the file block is located by querying the metadata.

Step 2: Choose a DataNode (usually the nearest) server and request to read the data.

Step 3: DataNode starts transmitting data to the client (reading the data input stream from the disk and verifying it).

Step 4: The client receives files in packets, caches them locally, and writes to the target file.

Summary

In fact, the above description is not complete. In HDFS storage, the distance of nodes is also a very important factor that determines the location of nodes for replica storage.

Similarly, the way DataNode stores data is also very interesting, so I'd like to mention it again here. After the communication pipeline is established, the DataNode will write data into the memory and disk simultaneously, which will improve the storage more efficient. At the same time, the DataNode will also open a new queue to receive the reply message from each node to ensure the successful writing of the data.

After having a basic understanding of the detailed process of writing and reading data in HDFS, it can help us better understand the working principle of HDFS and also make the concept of distribution of big data more clear.

kupcdqvljv
March 4th, 2025 at 11:31 am

选材新颖独特，通过细节描写赋予主题鲜活生命力。

plppjrtrho
March 2nd, 2025 at 12:55 am

配图与文字相辅相成，直观易懂。

jepovoywcx
February 28th, 2025 at 04:32 pm

情感真挚自然，字里行间传递出强烈的感染力。

qtbbcjcwhb
January 6th, 2025 at 07:44 am

哈哈哈，写的太好了https://www.lawjida.com/

veilcwawsb
October 19th, 2024 at 10:35 am

文章的确不错啊https://www.cscnn.com/

csnuhiqsar
October 6th, 2024 at 12:59 pm

想想你的文章写的特别好www.jiwenlaw.com

ugwcjjggna
October 4th, 2024 at 03:15 pm

看的我热血沸腾啊https://www.ea55.com/

tikyfnwksi
October 1st, 2024 at 02:35 pm

想想你的文章写的特别好https://www.237fa.com/

xhciiomzjv
September 23rd, 2024 at 02:54 am

看的我热血沸腾啊https://www.jiwenlaw.com/

qyghhldfgh
September 23rd, 2024 at 01:18 am

叼茂SEO.bfbikes.com

Hadoop(HDFS) —Write/Read process

The process of writing data in HDFS

The process of reading data in HDFS

Summary

10 comments

Leave a Comment Cancel reply
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

Hadoop(HDFS) —Write/Read process

The process of writing data in HDFS

The process of reading data in HDFS

Summary

10 comments

Leave a Comment Cancel reply 使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

Hadoop(HDFS) —Write/Read process

Leave a Comment Cancel reply
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款