In the previous Blog , we have already known NameNode is the supervisor and manager of the system and DataNodes store data and excute commands. This time I want to explain how NameNode acquires the storage information of DataNodes and transmits commands to them.

HOW NameNode WORK?

DN

  1. A data block is stored on the disk in the form of a file on the DataNode, including two files, one is the data itself, and the other is metadata which contains the length of the data block, the checksum of the block data, and the timestamp.
  2. After the DataNode starts up, it registers with the NameNode.
  3. After passing, DataNode periodically (6 hours) report all block information to the NameNode.
  4. The heartbeat is once every 3 seconds. It returns the result with the command from the NameNode to the DataNode, such as copying block data to another machine or deleting a data block. If a heartbeat is not received from a DataNode for more than 10 minutes, the node is considered unavailable.

Ensure data integrity

Think about a problem. If the data stored in the computer disk is the red light signal and the green light signal that controla the high-speed rail signal light, but the disk storing is broken, it always displays the green light, which is very dangerous. In the same way, if the data on the DataNode node is damaged, how to solve this potential data danger?

DataNode has its own method to ensure data integrity.

When the DataNode reads the Block, it calculates CheckSum. If the calculated CheckSum is different from the value when the Block was created, it means that the Block is damaged, and the Client reads the Blocks on other DataNodes.

Common checksum algorithms include crc(32), md5(128), sha1(160). The DataNode periodically validates CheckSum after its file is created.


Summary

The DataNode is an crucial way for HDFS to store files, and the way it communicates with the NameNode enables HDFS to obtain the stored data on each DataNode and send commands. DataNode also has a method to ensure data integrity by checking algorithm. In addition, the heartbeat mechanism can check whether the DataNode node is online or not and judge the node as dead after a certain period of disconnection.

Last modification:March 21, 2024
给阿姨倒一杯卡布奇诺~