• RocketMQ中Broker的HA策略源码分析


    Broker的HA策略分为两部分
    ①同步元数据
    ②同步消息数据

    同步元数据

    在Slave启动时,会启动一个定时任务用来从master同步元数据

     1 if (role == BrokerRole.SLAVE) {
     2     if (null != slaveSyncFuture) {
     3         slaveSyncFuture.cancel(false);
     4     }
     5     this.slaveSynchronize.setMasterAddr(null);
     6     slaveSyncFuture = this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() {
     7         @Override
     8         public void run() {
     9             try {
    10                 BrokerController.this.slaveSynchronize.syncAll();
    11             }
    12             catch (Throwable e) {
    13                 log.error("ScheduledTask SlaveSynchronize syncAll error.", e);
    14             }
    15         }
    16     }, 1000 * 3, 1000 * 10, TimeUnit.MILLISECONDS);
    17 } 

    这里设置了定时任务,执行slaveSynchronize的syncAll方法
    可以注意在之前会通过setMasterAddr将Master的地址设为null,这是由于在后面会通过另一个定时任务registerBrokerAll来向NameServer获取Master的地址,详见:

    【RocketMQ中Broker的启动源码分析(二)】


    SlaveSynchronize的syncAll方法:

    1 public void syncAll() {
    2     this.syncTopicConfig();
    3     this.syncConsumerOffset();
    4     this.syncDelayOffset();
    5     this.syncSubscriptionGroupConfig();
    6 }

    这个方法会依次调用四个方法,来同步相应信息:
    syncTopicConfig:同步topic的配置信息
    syncConsumerOffset:同步Consumer的Offset信息
    syncDelayOffset:同步延迟队列信息
    syncSubscriptionGroupConfig:同步订阅信息


    由于这几个方法的实现是类似的,这里就只看下syncTopicConfig的实现:
    syncTopicConfig方法:

     1 private void syncTopicConfig() {
     2     String masterAddrBak = this.masterAddr;
     3     if (masterAddrBak != null && !masterAddrBak.equals(brokerController.getBrokerAddr())) {
     4         try {
     5             TopicConfigSerializeWrapper topicWrapper =
     6                 this.brokerController.getBrokerOuterAPI().getAllTopicConfig(masterAddrBak);
     7             if (!this.brokerController.getTopicConfigManager().getDataVersion()
     8                 .equals(topicWrapper.getDataVersion())) {
     9 
    10                 this.brokerController.getTopicConfigManager().getDataVersion()
    11                     .assignNewOne(topicWrapper.getDataVersion());
    12                 this.brokerController.getTopicConfigManager().getTopicConfigTable().clear();
    13                 this.brokerController.getTopicConfigManager().getTopicConfigTable()
    14                     .putAll(topicWrapper.getTopicConfigTable());
    15                 this.brokerController.getTopicConfigManager().persist();
    16 
    17                 log.info("Update slave topic config from master, {}", masterAddrBak);
    18             }
    19         } catch (Exception e) {
    20             log.error("SyncTopicConfig Exception, {}", masterAddrBak, e);
    21         }
    22     }
    23 }

    这里首先获取master的地址masterAddr,由于registerBrokerAll定时任务的存在,即便这一次没有获取到masterAddr,只要节点中有master,总会在后面定时执行时从NameServer中获取到


    当获取到master地址后,通过BrokerOuterAPI的getAllTopicConfig方法,向master请求
    BrokerOuterAPI的getAllTopicConfig方法:

     1 public TopicConfigSerializeWrapper getAllTopicConfig(
     2     final String addr) throws RemotingConnectException, RemotingSendRequestException,
     3     RemotingTimeoutException, InterruptedException, MQBrokerException {
     4     RemotingCommand request = RemotingCommand.createRequestCommand(RequestCode.GET_ALL_TOPIC_CONFIG, null);
     5 
     6     RemotingCommand response = this.remotingClient.invokeSync(MixAll.brokerVIPChannel(true, addr), request, 3000);
     7     assert response != null;
     8     switch (response.getCode()) {
     9         case ResponseCode.SUCCESS: {
    10             return TopicConfigSerializeWrapper.decode(response.getBody(), TopicConfigSerializeWrapper.class);
    11         }
    12         default:
    13             break;
    14     }
    15 
    16     throw new MQBrokerException(response.getCode(), response.getRemark());
    17 }

    首先构建GET_ALL_TOPIC_CONFIG求情指令,然后通过remotingClient的invokeSync进行同步发送,注意这里会通过MixAll的brokerVIPChannel方法,得到对应的master地址的VIP通道地址,就是端口号减2,这在我之前的博客中介绍过
    有关同步发送在  【RocketMQ中Producer消息的发送源码分析】 中详细介绍过

    请求发送给master后,来看看master是怎么处理的
    master端在收到请求后会通过AdminBrokerProcessor的processRequest方法判别请求指令:

    1 case RequestCode.GET_ALL_TOPIC_CONFIG:
    2     return this.getAllTopicConfig(ctx, request);

    执行getAllTopicConfig方法:

     1 private RemotingCommand getAllTopicConfig(ChannelHandlerContext ctx, RemotingCommand request) {
     2     final RemotingCommand response = RemotingCommand.createResponseCommand(GetAllTopicConfigResponseHeader.class);
     3     // final GetAllTopicConfigResponseHeader responseHeader =
     4     // (GetAllTopicConfigResponseHeader) response.readCustomHeader();
     5 
     6     String content = this.brokerController.getTopicConfigManager().encode();
     7     if (content != null && content.length() > 0) {
     8         try {
     9             response.setBody(content.getBytes(MixAll.DEFAULT_CHARSET));
    10         } catch (UnsupportedEncodingException e) {
    11             log.error("", e);
    12 
    13             response.setCode(ResponseCode.SYSTEM_ERROR);
    14             response.setRemark("UnsupportedEncodingException " + e);
    15             return response;
    16         }
    17     } else {
    18         log.error("No topic in this broker, client: {}", ctx.channel().remoteAddress());
    19         response.setCode(ResponseCode.SYSTEM_ERROR);
    20         response.setRemark("No topic in this broker");
    21         return response;
    22     }
    23 
    24     response.setCode(ResponseCode.SUCCESS);
    25     response.setRemark(null);
    26 
    27     return response;
    28 }

    这里会将TopicConfigManager中保存的topicConfigTable:

    1 private final ConcurrentMap<String, TopicConfig> topicConfigTable =
    2         new ConcurrentHashMap<String, TopicConfig>(1024);

    将这个map通过encode方法转换成json字符串,再通过Netty发送给slave


    回到slave中,在同步发送的情况下,会等待会送响应,收到响应后:

    1 switch (response.getCode()) {
    2     case ResponseCode.SUCCESS: {
    3         return TopicConfigSerializeWrapper.decode(response.getBody(), TopicConfigSerializeWrapper.class);
    4     }
    5     default:
    6         break;
    7 }

    通过decode解码,将json字符串转换为map封装在 TopicConfigSerializeWrapper中


    回到syncTopicConfig方法中:
    得到TopicConfigSerializeWrapper实例后

     1 if (!this.brokerController.getTopicConfigManager().getDataVersion()
     2     .equals(topicWrapper.getDataVersion())) {
     3 
     4     this.brokerController.getTopicConfigManager().getDataVersion()
     5         .assignNewOne(topicWrapper.getDataVersion());
     6     this.brokerController.getTopicConfigManager().getTopicConfigTable().clear();
     7     this.brokerController.getTopicConfigManager().getTopicConfigTable()
     8         .putAll(topicWrapper.getTopicConfigTable());
     9     this.brokerController.getTopicConfigManager().persist();
    10 
    11     log.info("Update slave topic config from master, {}", masterAddrBak);
    12 }

    判断版本是否一致,若不一致,会进行替换,这样slave的Topic配置信息就和master保持同步了

    其他三种信息的同步同理

     同步消息数据

    在master启动时,会通过JDK的NIO方式启动一个HA服务线程,用以处理slave的连接:

     1 public void run() {
     2     log.info(this.getServiceName() + " service started");
     3 
     4     while (!this.isStopped()) {
     5         try {
     6             this.selector.select(1000);
     7             Set<SelectionKey> selected = this.selector.selectedKeys();
     8 
     9             if (selected != null) {
    10                 for (SelectionKey k : selected) {
    11                     if ((k.readyOps() & SelectionKey.OP_ACCEPT) != 0) {
    12                         SocketChannel sc = ((ServerSocketChannel) k.channel()).accept();
    13 
    14                         if (sc != null) {
    15                             HAService.log.info("HAService receive new connection, "
    16                                 + sc.socket().getRemoteSocketAddress());
    17 
    18                             try {
    19                                 HAConnection conn = new HAConnection(HAService.this, sc);
    20                                 conn.start();
    21                                 HAService.this.addConnection(conn);
    22                             } catch (Exception e) {
    23                                 log.error("new HAConnection exception", e);
    24                                 sc.close();
    25                             }
    26                         }
    27                     } else {
    28                         log.warn("Unexpected ops in select " + k.readyOps());
    29                     }
    30                 }
    31 
    32                 selected.clear();
    33             }
    34         } catch (Exception e) {
    35             log.error(this.getServiceName() + " service has exception.", e);
    36         }
    37     }
    38 
    39     log.info(this.getServiceName() + " service end");
    40 }

    这里就是非常典型的JDK NIO的使用,在侦听到连接取得SocketChannel后,将其封装为HAConnection

     1 public HAConnection(final HAService haService, final SocketChannel socketChannel) throws IOException {
     2     this.haService = haService;
     3     this.socketChannel = socketChannel;
     4     this.clientAddr = this.socketChannel.socket().getRemoteSocketAddress().toString();
     5     this.socketChannel.configureBlocking(false);
     6     this.socketChannel.socket().setSoLinger(false, -1);
     7     this.socketChannel.socket().setTcpNoDelay(true);
     8     this.socketChannel.socket().setReceiveBufferSize(1024 * 64);
     9     this.socketChannel.socket().setSendBufferSize(1024 * 64);
    10     this.writeSocketService = new WriteSocketService(this.socketChannel);
    11     this.readSocketService = new ReadSocketService(this.socketChannel);
    12     this.haService.getConnectionCount().incrementAndGet();
    13 }

    在构造方法内进行了对socketChannel的一些配置,还创建了一个WriteSocketService和一个ReadSocketService,这两个是后续处理消息同步的基础


    在创建完HAConnection后,调用其start方法:

    1 public void start() {
    2     this.readSocketService.start();
    3     this.writeSocketService.start();
    4 }

    这里会启动两个线程,分别处理读取slave发送的数据,以及向slave发送数据


    到这里,先不急着分析master了,来看看slave端
    slave在启动时,会启动HAClient的线程:

     1 public void run() {
     2     log.info(this.getServiceName() + " service started");
     3 
     4     while (!this.isStopped()) {
     5         try {
     6             if (this.connectMaster()) {
     7 
     8                 if (this.isTimeToReportOffset()) {
     9                     boolean result = this.reportSlaveMaxOffset(this.currentReportedOffset);
    10                     if (!result) {
    11                         this.closeMaster();
    12                     }
    13                 }
    14 
    15                 this.selector.select(1000);
    16 
    17                 boolean ok = this.processReadEvent();
    18                 if (!ok) {
    19                     this.closeMaster();
    20                 }
    21 
    22                 if (!reportSlaveMaxOffsetPlus()) {
    23                     continue;
    24                 }
    25 
    26                 long interval =
    27                     HAService.this.getDefaultMessageStore().getSystemClock().now()
    28                         - this.lastWriteTimestamp;
    29                 if (interval > HAService.this.getDefaultMessageStore().getMessageStoreConfig()
    30                     .getHaHousekeepingInterval()) {
    31                     log.warn("HAClient, housekeeping, found this connection[" + this.masterAddress
    32                         + "] expired, " + interval);
    33                     this.closeMaster();
    34                     log.warn("HAClient, master not response some time, so close connection");
    35                 }
    36             } else {
    37                 this.waitForRunning(1000 * 5);
    38             }
    39         } catch (Exception e) {
    40             log.warn(this.getServiceName() + " service has exception. ", e);
    41             this.waitForRunning(1000 * 5);
    42         }
    43     }
    44 
    45     log.info(this.getServiceName() + " service end");
    46 }

    在这个while循环中,首先通过connectMaster检查是否和master连接了


    connectMaster方法:

     1 private boolean connectMaster() throws ClosedChannelException {
     2     if (null == socketChannel) {
     3         String addr = this.masterAddress.get();
     4         if (addr != null) {
     5 
     6             SocketAddress socketAddress = RemotingUtil.string2SocketAddress(addr);
     7             if (socketAddress != null) {
     8                 this.socketChannel = RemotingUtil.connect(socketAddress);
     9                 if (this.socketChannel != null) {
    10                     this.socketChannel.register(this.selector, SelectionKey.OP_READ);
    11                 }
    12             }
    13         }
    14 
    15         this.currentReportedOffset = HAService.this.defaultMessageStore.getMaxPhyOffset();
    16 
    17         this.lastWriteTimestamp = System.currentTimeMillis();
    18     }
    19 
    20     return this.socketChannel != null;
    21 }

    若是socketChannel为null,意味着并没有产生连接,或者连接断开
    需要重新根据masterAddress建立网络连接

    只要是需要建立连接,都需要通过defaultMessageStore的getMaxPhyOffset方法,获取本地最大的Offset,由currentReportedOffset保存,后续用于向master报告;以及保存了一个时间戳lastWriteTimestamp,用于之后的校对


    当确保与master的连接建立成功后,通过isTimeToReportOffset方法,检查是否需要向master报告当前的最大Offset

    isTimeToReportOffset方法:

    1 private boolean isTimeToReportOffset() {
    2     long interval =
    3         HAService.this.defaultMessageStore.getSystemClock().now() - this.lastWriteTimestamp;
    4     boolean needHeart = interval > HAService.this.defaultMessageStore.getMessageStoreConfig()
    5         .getHaSendHeartbeatInterval();
    6 
    7     return needHeart;
    8 }

    这里就通过lastWriteTimestamp和当前时间检查,判断是否达到了报告时间间隔HaSendHeartbeatInterval,默认5s


    若是达到了,就需要通过reportSlaveMaxOffset方法,将记录的currentReportedOffset这个最大的offset发送给master


    reportSlaveMaxOffset方法:

     1 private boolean reportSlaveMaxOffset(final long maxOffset) {
     2     this.reportOffset.position(0);
     3     this.reportOffset.limit(8);
     4     this.reportOffset.putLong(maxOffset);
     5     this.reportOffset.position(0);
     6     this.reportOffset.limit(8);
     7 
     8     for (int i = 0; i < 3 && this.reportOffset.hasRemaining(); i++) {
     9         try {
    10             this.socketChannel.write(this.reportOffset);
    11         } catch (IOException e) {
    12             log.error(this.getServiceName()
    13                 + "reportSlaveMaxOffset this.socketChannel.write exception", e);
    14             return false;
    15         }
    16     }
    17 
    18     return !this.reportOffset.hasRemaining();
    19 }

    其中reportOffset是专门用来缓存offset的ByteBuffer

    1 private final ByteBuffer reportOffset = ByteBuffer.allocate(8);

    将maxOffset存放在reportOffset中,然后通过socketChannel的write方法,完成向master的发送

    其中hasRemaining方法用来检查当前位置是否已经达到缓冲区极限limit,确保reportOffset 中的内容能被完全发送出去

    发送成功后,会调用selector的select方法,在超时时间内进行NIO的轮询,等待master的回送


    通过这我们可以看出slave在和master建立连接后,会定时向master报告自己当前的offset


    来看看master收到offset后是如何处理的:

    在master端会通过前面提到的ReadSocketService线程进行处理:

     1 public void run() {
     2     HAConnection.log.info(this.getServiceName() + " service started");
     3 
     4     while (!this.isStopped()) {
     5         try {
     6             this.selector.select(1000);
     7             boolean ok = this.processReadEvent();
     8             if (!ok) {
     9                 HAConnection.log.error("processReadEvent error");
    10                 break;
    11             }
    12 
    13             long interval = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now() - this.lastReadTimestamp;
    14             if (interval > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaHousekeepingInterval()) {
    15                 log.warn("ha housekeeping, found this connection[" + HAConnection.this.clientAddr + "] expired, " + interval);
    16                 break;
    17             }
    18         } catch (Exception e) {
    19             HAConnection.log.error(this.getServiceName() + " service has exception.", e);
    20             break;
    21         }
    22     }
    23 
    24     this.makeStop();
    25 
    26     writeSocketService.makeStop();
    27 
    28     haService.removeConnection(HAConnection.this);
    29 
    30     HAConnection.this.haService.getConnectionCount().decrementAndGet();
    31 
    32     SelectionKey sk = this.socketChannel.keyFor(this.selector);
    33     if (sk != null) {
    34         sk.cancel();
    35     }
    36 
    37     try {
    38         this.selector.close();
    39         this.socketChannel.close();
    40     } catch (IOException e) {
    41         HAConnection.log.error("", e);
    42     }
    43 
    44     HAConnection.log.info(this.getServiceName() + " service end");
    45 }

    这里的while循环中首先也是通过selector的select方法,在超时时间内进行NIO的轮询

    轮询结束后的进一步的处理由processReadEvent来完成:

     1 private boolean processReadEvent() {
     2         int readSizeZeroTimes = 0;
     3 
     4         if (!this.byteBufferRead.hasRemaining()) {
     5             this.byteBufferRead.flip();
     6             this.processPostion = 0;
     7         }
     8 
     9         while (this.byteBufferRead.hasRemaining()) {
    10             try {
    11                 int readSize = this.socketChannel.read(this.byteBufferRead);
    12                 if (readSize > 0) {
    13                     readSizeZeroTimes = 0;
    14                     this.lastReadTimestamp = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now();
    15                     if ((this.byteBufferRead.position() - this.processPostion) >= 8) {
    16                         int pos = this.byteBufferRead.position() - (this.byteBufferRead.position() % 8);
    17                         long readOffset = this.byteBufferRead.getLong(pos - 8);
    18                         this.processPostion = pos;
    19 
    20                         HAConnection.this.slaveAckOffset = readOffset;
    21                         if (HAConnection.this.slaveRequestOffset < 0) {
    22                             HAConnection.this.slaveRequestOffset = readOffset;
    23                             log.info("slave[" + HAConnection.this.clientAddr + "] request offset " + readOffset);
    24                         }
    25 
    26                         HAConnection.this.haService.notifyTransferSome(HAConnection.this.slaveAckOffset);
    27                     }
    28                 } else if (readSize == 0) {
    29                     if (++readSizeZeroTimes >= 3) {
    30                         break;
    31                     }
    32                 } else {
    33                     log.error("read socket[" + HAConnection.this.clientAddr + "] < 0");
    34                     return false;
    35                 }
    36             } catch (IOException e) {
    37                 log.error("processReadEvent exception", e);
    38                 return false;
    39             }
    40         }
    41 
    42         return true;
    43     }
    44 }

    这个方法其实就是通过socketChannel的read方法,将slave发送过来的数据存入byteBufferRead中
    在确保发送过来的数据能达到8字节时,取出long类型的offset值,然后交给HAConnection的slaveAckOffset成员进行保存

    其中slaveRequestOffset是用来处理第一次连接时的同步

    notifyTransferSome方法是作为同步master时,进行相应的唤醒操作,异步master则没有要求,在后面具体分析


    也就是说ReadSocketService这个线程,只是不断地读取并更新slave发送来的offset数据

    再来看看WriteSocketService线程是如何进行向slave的发送:

      1 public void run() {
      2     HAConnection.log.info(this.getServiceName() + " service started");
      3 
      4     while (!this.isStopped()) {
      5         try {
      6             this.selector.select(1000);
      7 
      8             if (-1 == HAConnection.this.slaveRequestOffset) {
      9                 Thread.sleep(10);
     10                 continue;
     11             }
     12 
     13             if (-1 == this.nextTransferFromWhere) {
     14                 if (0 == HAConnection.this.slaveRequestOffset) {
     15                     long masterOffset = HAConnection.this.haService.getDefaultMessageStore().getCommitLog().getMaxOffset();
     16                     masterOffset =
     17                         masterOffset
     18                             - (masterOffset % HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig()
     19                             .getMapedFileSizeCommitLog());
     20 
     21                     if (masterOffset < 0) {
     22                         masterOffset = 0;
     23                     }
     24 
     25                     this.nextTransferFromWhere = masterOffset;
     26                 } else {
     27                     this.nextTransferFromWhere = HAConnection.this.slaveRequestOffset;
     28                 }
     29 
     30                 log.info("master transfer data from " + this.nextTransferFromWhere + " to slave[" + HAConnection.this.clientAddr
     31                     + "], and slave request " + HAConnection.this.slaveRequestOffset);
     32             }
     33 
     34             if (this.lastWriteOver) {
     35 
     36                 long interval =
     37                     HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now() - this.lastWriteTimestamp;
     38 
     39                 if (interval > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig()
     40                     .getHaSendHeartbeatInterval()) {
     41 
     42                     // Build Header
     43                     this.byteBufferHeader.position(0);
     44                     this.byteBufferHeader.limit(headerSize);
     45                     this.byteBufferHeader.putLong(this.nextTransferFromWhere);
     46                     this.byteBufferHeader.putInt(0);
     47                     this.byteBufferHeader.flip();
     48 
     49                     this.lastWriteOver = this.transferData();
     50                     if (!this.lastWriteOver)
     51                         continue;
     52                 }
     53             } else {
     54                 this.lastWriteOver = this.transferData();
     55                 if (!this.lastWriteOver)
     56                     continue;
     57             }
     58 
     59             SelectMappedBufferResult selectResult =
     60                 HAConnection.this.haService.getDefaultMessageStore().getCommitLogData(this.nextTransferFromWhere);
     61             if (selectResult != null) {
     62                 int size = selectResult.getSize();
     63                 if (size > HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize()) {
     64                     size = HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig().getHaTransferBatchSize();
     65                 }
     66 
     67                 long thisOffset = this.nextTransferFromWhere;
     68                 this.nextTransferFromWhere += size;
     69 
     70                 selectResult.getByteBuffer().limit(size);
     71                 this.selectMappedBufferResult = selectResult;
     72 
     73                 // Build Header
     74                 this.byteBufferHeader.position(0);
     75                 this.byteBufferHeader.limit(headerSize);
     76                 this.byteBufferHeader.putLong(thisOffset);
     77                 this.byteBufferHeader.putInt(size);
     78                 this.byteBufferHeader.flip();
     79 
     80                 this.lastWriteOver = this.transferData();
     81             } else {
     82 
     83                 HAConnection.this.haService.getWaitNotifyObject().allWaitForRunning(100);
     84             }
     85         } catch (Exception e) {
     86 
     87             HAConnection.log.error(this.getServiceName() + " service has exception.", e);
     88             break;
     89         }
     90     }
     91 
     92     HAConnection.this.haService.getWaitNotifyObject().removeFromWaitingThreadTable();
     93 
     94     if (this.selectMappedBufferResult != null) {
     95         this.selectMappedBufferResult.release();
     96     }
     97 
     98     this.makeStop();
     99 
    100     readSocketService.makeStop();
    101 
    102     haService.removeConnection(HAConnection.this);
    103 
    104     SelectionKey sk = this.socketChannel.keyFor(this.selector);
    105     if (sk != null) {
    106         sk.cancel();
    107     }
    108 
    109     try {
    110         this.selector.close();
    111         this.socketChannel.close();
    112     } catch (IOException e) {
    113         HAConnection.log.error("", e);
    114     }
    115 
    116     HAConnection.log.info(this.getServiceName() + " service end");
    117 }

    这里一开始会对slaveRequestOffset进行一次判断,当且仅当slaveRequestOffset初始化的时候是才是-1

    也就是说当slave还没有发送过来offset时,WriteSocketService线程只会干等

    当slave发送来offset后
    首先对nextTransferFromWhere进行了判断,nextTransferFromWhere和slaveRequestOffset一样,在初始化的时候为-1
    也就代表着master和slave刚刚建立连接,并没有进行过一次消息的同步!

    此时会对修改了的slaveRequestOffset进行判断
    若是等于0,说明slave没有任何消息的历史记录,那么此时master会取得自身的MaxOffset,根据这个MaxOffset,通过:

    1 masterOffset =  masterOffset
    2                 - (masterOffset % HAConnection.this.haService.getDefaultMessageStore().getMessageStoreConfig()
    3                 .getMapedFileSizeCommitLog() /* 1G */);

    计算出最后一个文件开始的offset
    也就是说,当slave没有消息的历史记录,master只会从本地最后一个CommitLog文件开始的地方,将消息数据发送给slave


    若是slave有数据,就从slave发送来的offset的位置起,进行发送,通过nextTransferFromWhere记录这个offset值


    接着对lastWriteOver进行了判断,lastWriteOver是一个状态量,用来表示上次发送是否传输完毕,初始化是true


    若是true,这里会进行一次时间检查,lastWriteTimestamp记录最后一次发送的时间
    一次来判断是否超过了时间间隔haSendHeartbeatInterval(默认5s)
    也就是说至少有5s,master没有向slave发送任何消息
    那么此时就会发送一个心跳包

    其中byteBufferHeader是一个12字节的ByteBuffer:

    1 private final int headerSize = 8 + 4;
    2 private final ByteBuffer byteBufferHeader = ByteBuffer.allocate(headerSize);

    这里就简单地构造了一个心跳包,后续通过transferData方法来完成数据的发送


    若是 lastWriteOver为false,则表示上次数据没有发送完,就需要通过transferData方法,将剩余数据继续发送,只要没发送完,只会重复循环,直到发完


    先继续往下看,下面就是发送具体的消息数据了:
    首先根据nextTransferFromWhere,也就是刚才保存的offset,通过DefaultMessageStore的getCommitLogData方法,其实际上调用的是CommitLog的getData方法,这个方法在

    【RocketMQ中Broker的启动源码分析(二)】中关于消息调度(ReputMessageService)时详细介绍过


    根据offset找到对应的CommitLog文件,将其从offset对应起始处所有数据读入ByteBuffer中,由SelectMappedBufferResult封装


    这里若是master已将将所有本地数据同步给了slave,那么得到的SelectMappedBufferResult就会为null,会调用:

    1 HAConnection.this.haService.getWaitNotifyObject().allWaitForRunning(100);

    将自身阻塞,超时等待100ms,要么一直等到超时时间到了,要么就会在后面所讲的同步双传中被同步master唤醒

    在得到SelectMappedBufferResult后,这里会对读取到的数据大小进行一次判断,若是大于haTransferBatchSize(默认32K),将size改为32K,实际上就是对发送数据大小的限制,大于32K会切割,每次最多只允许发送32k


    通过thisOffset记录nextTransferFromWhere即offset
    更新nextTransferFromWhere值,以便下一次定位
    还会将读取到的数据结果selectResult交给selectMappedBufferResult保存

    然后构建消息头,这里就和心跳包格式一样,前八字节存放offset,后四字节存放数据大小


    最后调用transferData方法,进行发送:

     1 private boolean transferData() throws Exception {
     2     int writeSizeZeroTimes = 0;
     3     // Write Header
     4     while (this.byteBufferHeader.hasRemaining()) {
     5         int writeSize = this.socketChannel.write(this.byteBufferHeader);
     6         if (writeSize > 0) {
     7             writeSizeZeroTimes = 0;
     8             this.lastWriteTimestamp = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now();
     9         } else if (writeSize == 0) {
    10             if (++writeSizeZeroTimes >= 3) {
    11                 break;
    12             }
    13         } else {
    14             throw new Exception("ha master write header error < 0");
    15         }
    16     }
    17 
    18     if (null == this.selectMappedBufferResult) {
    19         return !this.byteBufferHeader.hasRemaining();
    20     }
    21 
    22     writeSizeZeroTimes = 0;
    23 
    24     // Write Body
    25     if (!this.byteBufferHeader.hasRemaining()) {
    26         while (this.selectMappedBufferResult.getByteBuffer().hasRemaining()) {
    27             int writeSize = this.socketChannel.write(this.selectMappedBufferResult.getByteBuffer());
    28             if (writeSize > 0) {
    29                 writeSizeZeroTimes = 0;
    30                 this.lastWriteTimestamp = HAConnection.this.haService.getDefaultMessageStore().getSystemClock().now();
    31             } else if (writeSize == 0) {
    32                 if (++writeSizeZeroTimes >= 3) {
    33                     break;
    34                 }
    35             } else {
    36                 throw new Exception("ha master write body error < 0");
    37             }
    38         }
    39     }
    40 
    41     boolean result = !this.byteBufferHeader.hasRemaining() && !this.selectMappedBufferResult.getByteBuffer().hasRemaining();
    42 
    43     if (!this.selectMappedBufferResult.getByteBuffer().hasRemaining()) {
    44         this.selectMappedBufferResult.release();
    45         this.selectMappedBufferResult = null;
    46     }
    47 
    48     return result;
    49 }

    首先将byteBufferHeader中的12字节消息头通过socketChannel的write方法发送出去
    然后将selectMappedBufferResult中的ByteBuffer的消息数据发送出去

    若是selectMappedBufferResult等于null,说明是心跳包,只发送消息头
    无论发送什么都会将时间记录在lastWriteTimestamp中,以便后续发送心跳包的判断


    看到这里其实就会发现WriteSocketService线程开启后,只要slave向master发出了第一个offset后,WriteSocketService线程都会不断地将对应位置自己本地的CommitLog文件中的内容发送给slave,直到完全同步后,WriteSocketService线程才会稍微缓缓,进入阻塞100ms以及每隔五秒发一次心跳包的状态

    但是只要当Producer向master发送来消息后,由刷盘线程完成持久化后,WriteSocketService线程又会忙碌起来,此时也才是体现同步双写异步复制的时候

    先不急着说这个,来看看slave接收到消息是如何处理的:


    是在HAClient的线程中的processReadEvent方法处理的:

     1 private boolean processReadEvent() {
     2     int readSizeZeroTimes = 0;
     3     while (this.byteBufferRead.hasRemaining()) {
     4         try {
     5             int readSize = this.socketChannel.read(this.byteBufferRead);
     6             if (readSize > 0) {
     7                 lastWriteTimestamp = HAService.this.defaultMessageStore.getSystemClock().now();
     8                 readSizeZeroTimes = 0;
     9                 boolean result = this.dispatchReadRequest();
    10                 if (!result) {
    11                     log.error("HAClient, dispatchReadRequest error");
    12                     return false;
    13                 }
    14             } else if (readSize == 0) {
    15                 if (++readSizeZeroTimes >= 3) {
    16                     break;
    17                 }
    18             } else {
    19                 log.info("HAClient, processReadEvent read socket < 0");
    20                 return false;
    21             }
    22         } catch (IOException e) {
    23             log.info("HAClient, processReadEvent read socket exception", e);
    24             return false;
    25         }
    26     }
    27 
    28     return true;
    29 }

    在socketChannel通过read方法将master发送的数据读取到byteBufferRead缓冲区后,由dispatchReadRequest方法做进一步处理


    dispatchReadRequest方法:

     1 private boolean dispatchReadRequest() {
     2     final int msgHeaderSize = 8 + 4; // phyoffset + size
     3     int readSocketPos = this.byteBufferRead.position();
     4 
     5     while (true) {
     6         int diff = this.byteBufferRead.position() - this.dispatchPostion;
     7         if (diff >= msgHeaderSize) {
     8             long masterPhyOffset = this.byteBufferRead.getLong(this.dispatchPostion);
     9             int bodySize = this.byteBufferRead.getInt(this.dispatchPostion + 8);
    10 
    11             long slavePhyOffset = HAService.this.defaultMessageStore.getMaxPhyOffset();
    12 
    13             if (slavePhyOffset != 0) {
    14                 if (slavePhyOffset != masterPhyOffset) {
    15                     log.error("master pushed offset not equal the max phy offset in slave, SLAVE: "
    16                         + slavePhyOffset + " MASTER: " + masterPhyOffset);
    17                     return false;
    18                 }
    19             }
    20 
    21             if (diff >= (msgHeaderSize + bodySize)) {
    22                 byte[] bodyData = new byte[bodySize];
    23                 this.byteBufferRead.position(this.dispatchPostion + msgHeaderSize);
    24                 this.byteBufferRead.get(bodyData);
    25 
    26                 HAService.this.defaultMessageStore.appendToCommitLog(masterPhyOffset, bodyData);
    27 
    28                 this.byteBufferRead.position(readSocketPos);
    29                 this.dispatchPostion += msgHeaderSize + bodySize;
    30 
    31                 if (!reportSlaveMaxOffsetPlus()) {
    32                     return false;
    33                 }
    34 
    35                 continue;
    36             }
    37         }
    38 
    39         if (!this.byteBufferRead.hasRemaining()) {
    40             this.reallocateByteBuffer();
    41         }
    42 
    43         break;
    44     }
    45 
    46     return true;
    47 }

    这里就首先将12字节的消息头取出来
    masterPhyOffset:8字节offset ,bodySize :4字节消息大小
    根据master发来的masterPhyOffset会和自己本地的slavePhyOffset进行校验,以便安全备份


    之后就会将byteBufferRead中存放在消息头后面的消息数据取出来,调用appendToCommitLog方法持久化到的CommitLog中

     1 public boolean appendToCommitLog(long startOffset, byte[] data) {
     2     if (this.shutdown) {
     3         log.warn("message store has shutdown, so appendToPhyQueue is forbidden");
     4         return false;
     5     }
     6 
     7     boolean result = this.commitLog.appendData(startOffset, data);
     8     if (result) {
     9         this.reputMessageService.wakeup();
    10     } else {
    11         log.error("appendToPhyQueue failed " + startOffset + " " + data.length);
    12     }
    13 
    14     return result;
    15 }

    实际上调用了commitLog的appendData方法将其写入磁盘,这个方法我在前面博客中介绍过

    【RocketMQ中Broker的刷盘源码分析】


    在完成写入后,需要唤醒reputMessageService消息调度,以便Consumer的消费
    关于消息调度详见  【RocketMQ中Broker的启动源码分析(二)】


    当然前面说过master还会发送心跳消息,但这里明显没对心跳消息进行处理,只是appendToCommitLog调用时,传入了一个大小为0的byte数组,显然有些不合理,想不通


    在完成后,还会调用reportSlaveMaxOffsetPlus方法:

     1 private boolean reportSlaveMaxOffsetPlus() {
     2     boolean result = true;
     3     long currentPhyOffset = HAService.this.defaultMessageStore.getMaxPhyOffset();
     4     if (currentPhyOffset > this.currentReportedOffset) {
     5         this.currentReportedOffset = currentPhyOffset;
     6         result = this.reportSlaveMaxOffset(this.currentReportedOffset);
     7         if (!result) {
     8             this.closeMaster();
     9             log.error("HAClient, reportSlaveMaxOffset error, " + this.currentReportedOffset);
    10         }
    11     }
    12 
    13     return result;
    14 }

    由于完成了写入,那么此时获取到的offset肯定比currentReportedOffset中保存的大,然后再次通过reportSlaveMaxOffset方法,将当前的offset报告给master


    这其实上已经完成了异步master的异步复制过程


    再来看看同步双写是如何实现的:
    和刷盘一样,都是在Producer发送完消息,Broker进行完消息的存储后进行的

    【RocketMQ中Broker的消息存储源码分析】

    在CommitLog的handleHA方法:

     1 public void handleHA(AppendMessageResult result, PutMessageResult putMessageResult, MessageExt messageExt) {
     2     if (BrokerRole.SYNC_MASTER == this.defaultMessageStore.getMessageStoreConfig().getBrokerRole()) {
     3         HAService service = this.defaultMessageStore.getHaService();
     4         if (messageExt.isWaitStoreMsgOK()) {
     5             // Determine whether to wait
     6             if (service.isSlaveOK(result.getWroteOffset() + result.getWroteBytes())) {
     7                 GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes());
     8                 service.putRequest(request);
     9                 service.getWaitNotifyObject().wakeupAll();
    10                 boolean flushOK =
    11                     request.waitForFlush(this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout());
    12                 if (!flushOK) {
    13                     log.error("do sync transfer other node, wait return, but failed, topic: " + messageExt.getTopic() + " tags: "
    14                         + messageExt.getTags() + " client address: " + messageExt.getBornHostNameString());
    15                     putMessageResult.setPutMessageStatus(PutMessageStatus.FLUSH_SLAVE_TIMEOUT);
    16                 }
    17             }
    18             // Slave problem
    19             else {
    20                 // Tell the producer, slave not available
    21                 putMessageResult.setPutMessageStatus(PutMessageStatus.SLAVE_NOT_AVAILABLE);
    22             }
    23         }
    24     }
    25 
    26 }

    这里就会检查Broker的类型,看以看到只对SYNC_MASTER即同步master进行了操作

    这个操作过程其实就和同步刷盘类似

    【RocketMQ中Broker的刷盘源码分析】

    根据Offset+WroteBytes创建一条记录GroupCommitRequest,然后会将添加在List中
    然后调用getWaitNotifyObject的wakeupAll方法,把阻塞中的所有WriteSocketService线程唤醒
    因为master和slave是一对多的关系,那么这里就会有多个slave连接,也就有多个WriteSocketService线程,保证消息能同步到所有slave中

    在唤醒WriteSocketService线程工作后,调用request的waitForFlush方法,将自身阻塞,预示着同步复制的真正开启


    在HAService开启时,还开启了一个GroupTransferService线程:

     1 public void run() {
     2     log.info(this.getServiceName() + " service started");
     3 
     4     while (!this.isStopped()) {
     5         try {
     6             this.waitForRunning(10);
     7             this.doWaitTransfer();
     8         } catch (Exception e) {
     9             log.warn(this.getServiceName() + " service has exception. ", e);
    10         }
    11     }
    12 
    13     log.info(this.getServiceName() + " service end");
    14 }

    这里的工作原理和同步刷盘GroupCommitService基本一致,相似的地方我就不仔细分析了


    GroupTransferService同样保存两张List:

    1 private volatile List<CommitLog.GroupCommitRequest> requestsWrite = new ArrayList<>();
    2 private volatile List<CommitLog.GroupCommitRequest> requestsRead = new ArrayList<>();

    由这两张List做一个类似JVM新生代的复制算法
    在handleHA方法中,就会将创建的GroupCommitRequest记录添加在requestsWrite这个List中


    其中doWaitTransfer方法:

     1 private void doWaitTransfer() {
     2     synchronized (this.requestsRead) {
     3         if (!this.requestsRead.isEmpty()) {
     4             for (CommitLog.GroupCommitRequest req : this.requestsRead) {
     5                 boolean transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset();
     6                 for (int i = 0; !transferOK && i < 5; i++) {
     7                     this.notifyTransferObject.waitForRunning(1000);
     8                     transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset();
     9                 }
    10 
    11                 if (!transferOK) {
    12                     log.warn("transfer messsage to slave timeout, " + req.getNextOffset());
    13                 }
    14 
    15                 req.wakeupCustomer(transferOK);
    16             }
    17 
    18             this.requestsRead.clear();
    19         }
    20     }
    21 }

    和刷盘一样,这里会通过复制算法,将requestsWrite和requestsRead进行替换,那么这里的requestsRead实际上就存放着刚才添加的记录


    首先取出记录中的NextOffset和push2SlaveMaxOffset比较

    push2SlaveMaxOffset值是通过slave发送过来的,在之前说过的ReadSocketService线程中的:

    1 HAConnection.this.haService.notifyTransferSome(HAConnection.this.slaveAckOffset);

    notifyTransferSome方法:

     1 public void notifyTransferSome(final long offset) {
     2     for (long value = this.push2SlaveMaxOffset.get(); offset > value; ) {
     3         boolean ok = this.push2SlaveMaxOffset.compareAndSet(value, offset);
     4         if (ok) {
     5             this.groupTransferService.notifyTransferSome();
     6             break;
     7         } else {
     8             value = this.push2SlaveMaxOffset.get();
     9         }
    10     }
    11 }

    即便也多个slave连接,这里的push2SlaveMaxOffset永远会记录最大的那个offset


    所以在doWaitTransfer中,根据当前NextOffset(完成写入后master本地的offset),进行判断

    其实这里主要要考虑到WriteSocketService线程的工作原理,只要本地文件有更新,那么就会向slave发送数据,所以这里由于HA同步是发生在刷盘后的,那么就有可能在这个doWaitTransfer执行前,有slave已经将数据进行了同步,并且向master报告了自己offset,更新了push2SlaveMaxOffset的值

    那么

    1 boolean transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset();
    2 ```

    这个判断就会为真,意味着节点中已经有了备份,所以就会直接调用

    1 req.wakeupCustomer(transferOK);

    以此来唤醒刚才在handleHA方法中的阻塞


    若是判断为假,就说明没有一个slave完成同步,就需要

    1 for (int i = 0; !transferOK && i < 5; i++) {
    2     this.notifyTransferObject.waitForRunning(1000);
    3     transferOK = HAService.this.push2SlaveMaxOffset.get() >= req.getNextOffset();
    4 }

    通过waitForRunning进行阻塞,超时等待,最多五次等待,超过时间会向Producer发送FLUSH_SLAVE_TIMEOUT


    若是在超时时间内,有slave完成了同步,并向master发送了offset后,在notifyTransferSome方法中:

     1 public void notifyTransferSome(final long offset) {
     2     for (long value = this.push2SlaveMaxOffset.get(); offset > value; ) {
     3         boolean ok = this.push2SlaveMaxOffset.compareAndSet(value, offset);
     4         if (ok) {
     5             this.groupTransferService.notifyTransferSome();
     6             break;
     7         } else {
     8             value = this.push2SlaveMaxOffset.get();
     9         }
    10     }
    11 }

    就会更新push2SlaveMaxOffset,并通过notifyTransferSome唤醒上面所说的阻塞

    然后再次判断push2SlaveMaxOffset和getNextOffset
    成功后唤醒刚才在handleHA方法中的阻塞,同步master的主从复制也就结束
    由于同步master的刷盘是在主从复制前发生的,所以同步双写意味着master和slave都会完成消息的持久化

    至此,RocketMQ中Broker的HA策略分析到此结束

  • 相关阅读:
    LVM逻辑卷管理练习
    浅谈TCP三次握手和四次分手
    centos模拟创建CA和申请证书
    破解root口令
    shell脚本编程进阶总结
    基于FIFO的串口发送机设计
    流水线方式LUT查表法乘法器
    verilog中有符号整数说明及除法实现
    LUT查表法乘法器所犯下错误。。。。
    似然函数
  • 原文地址:https://www.cnblogs.com/a526583280/p/11318884.html
Copyright © 2020-2023  润新知