hadoop hdfs总结 NameNode部分 3 DatanodeDescriptor

DatanodeDescriptor是对DataNode的抽象，它是NameNode的内部数据结构，配合BlockMap和INode，记录了文件系统中所有Datanodes包含的Block信息，以及对应的INode信息。

DatanodeDescriptor继承自DatanodeInfo,DatanodeInfo继承自DatanodeID。

一、DatanodeID

DatanodeID有以下属性：

public String name; /// hostname:portNumber
public String storageID; /// unique per cluster storageID 集群内唯一的hostname
protected int infoPort; /// the port where the infoserver is running infoPort的端口号
public int ipcPort; /// the port where the ipc server is running 底层IPC通信端口号

二、DatanodeInfo

1、DatanodeInfo有以下属性：

protected long capacity;
protected long dfsUsed;
protected long remaining;

protected String hostName = null; hostname由Datanode在register时候提供
protected long lastUpdate;
protected int xceiverCount; 这个比较重要，表示的是Datanode与client或者Datanode连接时候的连接数，超出后会出错
protected String location = NetworkTopology.DEFAULT_RACK; 网络拓扑结构，这个可以定义，按照机架进行备份放置策略

protected AdminStates adminState; adminState表示的是此Datanode的运行状态，运行状态有NORMAL, DECOMMISSION_INPROGRESS, DECOMMISSIONED; 在Datanode进行decommission时候有用，decommission指的是Datanode下线，为了防止数据丢失，在下线过程中需要将此Datanode对应的Block拷贝到其他Datanode上。

2、重要方法

public String dumpDatanode() 将所有的属性统计信息输出。

三、DatanodeDescriptor

DatanodeDescriptor是对DataNode所有操作的抽象，DataNode就是存储文件系统的所有数据，数据对应了文件，文件由多个块构成，每个块又有多分备份。对于DataNode的操作，基本上有client向Datanode传输数据，Datanode需要记录所有的block，如果数据丢失需要将block进行重新复制(replicate)，如果数据在append过程或者传输过程中产生错误，需要进行恢复(recovery)等等。DatanodeDescriptor中封装了所有的操作。

1、重要数据结构

(1)内部类 BlockTargetPair

  public static class BlockTargetPair {
    public final Block block;
    public final DatanodeDescriptor[] targets;    

    BlockTargetPair(Block block, DatanodeDescriptor[] targets) {
      this.block = block;
      this.targets = targets;
    }
  }

表示的是block以及对应所有副本存放的Datanode。为下面的一些数据结构提供基础。

(2)内部类private static class BlockQueue

用来对BlockTargetPair队列进行封装，包括出列入列等方法。

(3)private volatile BlockInfo blockList = null;

每个DatanodeDescriptor要记录该Datanode所保存的所有Block，就是通过BlockInfo来保存的，blockList根据三元组存储(见BlocksMap分析)，作为头节点，存储所有的block，通过链表来获得。

(4)内部结构：

  /** A queue of blocks to be replicated by this datanode */
  private BlockQueue replicateBlocks = new BlockQueue();
  /** A queue of blocks to be recovered by this datanode */
  private BlockQueue recoverBlocks = new BlockQueue();
  /** A set of blocks to be invalidated by this datanode */
  private Set<Block> invalidateBlocks = new TreeSet<Block>();

这些内部结构包括有需要由这个Datanode复制给其它Datanode的----replicateBlock,需要由该Datanode复制给其它Datanode的----recoverBlocks，需要将Block从Datanode删除的。

前两个结构需要得到其它DatanodeDescriptor，由于需要获知需要进行复制和恢复的Datanode，而invalidate只是本次Datanode需要删除的，与其它Datanode无关。

(5)以下变量维护了block调度包括block report和heartbeat时间等。

  private int currApproxBlocksScheduled = 0;
  private int prevApproxBlocksScheduled = 0;
  private long lastBlocksScheduledRollTime = 0;
  private static final int BLOCKS_SCHEDULED_ROLL_INTERVAL = 600*1000; //10min

2、重要方法

(1)void updateHeartbeat

  void updateHeartbeat(long capacity, long dfsUsed, long remaining,
      int xceiverCount) {
    this.capacity = capacity;
    this.dfsUsed = dfsUsed;
    this.remaining = remaining;
    this.lastUpdate = System.currentTimeMillis();
    this.xceiverCount = xceiverCount;
    rollBlocksScheduled(lastUpdate);
  }

DataNode向NameNode进行心跳汇报时，更新状态，包括有capacity，dfsused，remainning和xceiverCount,并且将最后更新时间更新。

(2)boolean addBlock(BlockInfo b)

  boolean addBlock(BlockInfo b) {
    if(!b.addNode(this))
      return false;
    // add to the head of the data-node list
    blockList = b.listInsert(blockList, this);
    return true;
  }

将block插入到队列头。

(3)boolean removeBlock(BlockInfo b)

  boolean removeBlock(BlockInfo b) {
    blockList = b.listRemove(blockList, this);
    return b.removeNode(this);
  }

从队列中删除。

(4)void addBlockToBeReplicated

  void addBlockToBeReplicated(Block block, DatanodeDescriptor[] targets) {
    assert(block != null && targets != null && targets.length > 0);
    replicateBlocks.offer(block, targets);
  }

将Block放置在replicateBlocks结构中。

(5)void addBlockToBeRecovered

  void addBlockToBeRecovered(Block block, DatanodeDescriptor[] targets) {
    assert(block != null && targets != null && targets.length > 0);
    recoverBlocks.offer(block, targets);
  }

将Block放置在recoverBlocks结构中。

(6)void addBlocksToBeInvalidated

  void addBlocksToBeInvalidated(List<Block> blocklist) {
    assert(blocklist != null && blocklist.size() > 0);
    synchronized (invalidateBlocks) {
      for(Block blk : blocklist) {
        invalidateBlocks.add(blk);
      }
    }
  }

将Block放置在invalidateBlocks结构中。

(7) BlockCommand getReplicationCommand(int maxTransfers)

BlockCommand getLeaseRecoveryCommand(int maxTransfers)

BlockCommand getInvalidateBlocks(int maxblocks)

这三个方法相同，就是将三个内部数据结构中的数据封装成writable的数据形式传输给对应的Datanode，同时将cmd指定为DatanodeProtocol.DNA_TRANSFER，DatanodeProtocol.DNA_RECOVERBLOCK或者DatanodeProtocol.DNA_INVALIDATE。

(8)reportDiff 这个方法是DatanodeDescriptor中最重要的方法

void reportDiff(BlocksMap blocksMap,
                  BlockListAsLongs newReport,
                  Collection<Block> toAdd,
                  Collection<Block> toRemove,
                  Collection<Block> toInvalidate) {
    // place a deilimiter in the list which separates blocks 
    // that have been reported from those that have not
    BlockInfo delimiter = new BlockInfo(new Block(), 1);
    boolean added = this.addBlock(delimiter);
    assert added : "Delimiting block cannot be present in the node";
    if(newReport == null)
      newReport = new BlockListAsLongs( new long[0]);
    // scan the report and collect newly reported blocks
    // Note we are taking special precaution to limit tmp blocks allocated
    // as part this block report - which why block list is stored as longs
    Block iblk = new Block(); // a fixed new'ed block to be reused with index i
    for (int i = 0; i < newReport.getNumberOfBlocks(); ++i) {
      iblk.set(newReport.getBlockId(i), newReport.getBlockLen(i), 
               newReport.getBlockGenStamp(i));
      BlockInfo storedBlock = blocksMap.getStoredBlock(iblk);
      if(storedBlock == null) {
        // If block is not in blocksMap it does not belong to any file
        toInvalidate.add(new Block(iblk));
        continue;
      }
      if(storedBlock.findDatanode(this) < 0) {// Known block, but not on the DN
        // if the size differs from what is in the blockmap, then return
        // the new block. addStoredBlock will then pick up the right size of this
        // block and will update the block object in the BlocksMap
        if (storedBlock.getNumBytes() != iblk.getNumBytes()) {
          toAdd.add(new Block(iblk));
        } else {
          toAdd.add(storedBlock);
        }
        continue;
      }
      // move block to the head of the list
      this.moveBlockToHead(storedBlock);
    }
    // collect blocks that have not been reported
    // all of them are next to the delimiter
    Iterator<Block> it = new BlockIterator(delimiter.getNext(0), this);
    while(it.hasNext())
      toRemove.add(it.next());
    this.removeBlock(delimiter);
  }

Datanode会定期向NameNode进行report，当然由于report十分消耗资源，所有report时间不会非常频繁。当汇报时候，会将新获得的Block与BlocksMap中的Block进行对比，如果BlocksMap中不存在该Block，则删除。如果缺少副本数则添加，其它的加入道Datanode到Block的映射中。

相关阅读:
Chapter5树状数组与线段树(补充差分)(待补全两题)
Chapter4枚举，模拟与排序
 CopyOnWriteArrayList实现原理及源码分析
 BAT大厂面试官必问的HashMap相关面试题及部分源码分析
 ArrayList、Vector、LinkedList、CopyOnWriteArrayList等详解
 java_集合知识点小结
 Fork-Join 原理深入分析（二）
Fork-Join分治编程介绍（一）
ForkJoin全解2：forkjoin实际工作流程与实现
 ForkJoin全解1：简单使用与大致实现原理
原文地址：https://www.cnblogs.com/sidmeng/p/2416865.html