• kafka 和 rocketMQ 的数据存储


    kafka 版本:1.1.1

    一个分区对应一个文件夹,数据以 segment 文件存储,segment 默认 1G。

    分区文件夹:

    segment 文件:

    segment 的命名规则是怎样的?

    kafka roll segment 的逻辑:kafka.log.Log#roll

      /**
       * Roll the log over to a new active segment starting with the current logEndOffset.
       * This will trim the index to the exact size of the number of entries it currently contains.
       *
       * @return The newly rolled segment
       */
      def roll(expectedNextOffset: Option[Long] = None): LogSegment = {
        maybeHandleIOException(s"Error while rolling log segment for $topicPartition in dir ${dir.getParent}") {
          val start = time.hiResClockMs()
          lock synchronized {
            checkIfMemoryMappedBufferClosed()
            val newOffset = math.max(expectedNextOffset.getOrElse(0L), logEndOffset)
            // 00000000000030898257.log 文件
            val logFile = Log.logFile(dir, newOffset)
    
            if (segments.containsKey(newOffset)) {
              // segment with the same base offset already exists and loaded
              if (activeSegment.baseOffset == newOffset && activeSegment.size == 0) {
                // We have seen this happen (see KAFKA-6388) after shouldRoll() returns true for an
                // active segment of size zero because of one of the indexes is "full" (due to _maxEntries == 0).
                warn(s"Trying to roll a new log segment with start offset $newOffset " +
                     s"=max(provided offset = $expectedNextOffset, LEO = $logEndOffset) while it already " +
                     s"exists and is active with size 0. Size of time index: ${activeSegment.timeIndex.entries}," +
                     s" size of offset index: ${activeSegment.offsetIndex.entries}.")
                deleteSegment(activeSegment)
              } else {
                throw new KafkaException(s"Trying to roll a new log segment for topic partition $topicPartition with start offset $newOffset" +
                                         s" =max(provided offset = $expectedNextOffset, LEO = $logEndOffset) while it already exists. Existing " +
                                         s"segment is ${segments.get(newOffset)}.")
              }
            } else if (!segments.isEmpty && newOffset < activeSegment.baseOffset) {
              throw new KafkaException(
                s"Trying to roll a new log segment for topic partition $topicPartition with " +
                s"start offset $newOffset =max(provided offset = $expectedNextOffset, LEO = $logEndOffset) lower than start offset of the active segment $activeSegment")
            } else {
              val offsetIdxFile = offsetIndexFile(dir, newOffset)
              val timeIdxFile = timeIndexFile(dir, newOffset)
              val txnIdxFile = transactionIndexFile(dir, newOffset)
              for (file <- List(logFile, offsetIdxFile, timeIdxFile, txnIdxFile) if file.exists) {
                warn(s"Newly rolled segment file ${file.getAbsolutePath} already exists; deleting it first")
                Files.delete(file.toPath)
              }
    
              Option(segments.lastEntry).foreach(_.getValue.onBecomeInactiveSegment())
            }
    
            // take a snapshot of the producer state to facilitate recovery. It is useful to have the snapshot
            // offset align with the new segment offset since this ensures we can recover the segment by beginning
            // with the corresponding snapshot file and scanning the segment data. Because the segment base offset
            // may actually be ahead of the current producer state end offset (which corresponds to the log end offset),
            // we manually override the state offset here prior to taking the snapshot.
            producerStateManager.updateMapEndOffset(newOffset)
            producerStateManager.takeSnapshot()
    
            val segment = LogSegment.open(dir,
              baseOffset = newOffset,
              config,
              time = time,
              fileAlreadyExists = false,
              initFileSize = initFileSize,
              preallocate = config.preallocate)
            addSegment(segment)
            // We need to update the segment base offset and append position data of the metadata when log rolls.
            // The next offset should not change.
            updateLogEndOffset(nextOffsetMetadata.messageOffset)
            // schedule an asynchronous flush of the old segment
            scheduler.schedule("flush-log", () => flush(newOffset), delay = 0L)
    
            info(s"Rolled new log segment at offset $newOffset in ${time.hiResClockMs() - start} ms.")
    
            segment
          }
        }
      }

    可以看到,segment 使用当前 logEndOffset 作为文件名。即 segment 文件用第一条消息的 offset 作文件名。

    还有一个和 log 文件同名的 index 文件,index 文件内容是 offset/position,一个 entry 包含 2 个 int,一共 8 字节。

    kafka.log.OffsetIndex#append

      /**
       * Append an entry for the given offset/location pair to the index. This entry must have a larger offset than all subsequent entries.
       */
      def append(offset: Long, position: Int) {
        inLock(lock) {
          require(!isFull, "Attempt to append to a full index (size = " + _entries + ").")
          if (_entries == 0 || offset > _lastOffset) {
            trace(s"Adding index entry $offset => $position to ${file.getAbsolutePath}")
            // 相对偏移量
            mmap.putInt((offset - baseOffset).toInt)
            // 消息在 log 文件中的物理地址
            mmap.putInt(position)
            _entries += 1
            _lastOffset = offset
            require(_entries * entrySize == mmap.position(), entries + " entries but file position in index is " + mmap.position() + ".")
          } else {
            throw new InvalidOffsetException(s"Attempt to append an offset ($offset) to position $entries no larger than" +
              s" the last offset appended (${_lastOffset}) to ${file.getAbsolutePath}.")
          }
        }
      }

    盗图一张:

    http://rocketmq.cloud/zh-cn/docs/design-store.html

    而 rocketMQ 的存储与 kafka 不同,分为 commitlog 和 consumequeue:

    所有 topic 的消息存储在 commitlog 文件中,commitlog 默认按 1G 分段,文件名按物理偏移量命名。

    而索引信息保存在 consumequeue/topic/queue 目录下,一个 entry 固定 20 字节,分别为 8 字节的 commitlog 物理偏移量、4 字节的消息长度、8 字节 tag hashcode。

    从代码推出 commitLog 和 consumeQueue 的文件存储格式。

    默认文件大小

    // org.apache.rocketmq.store.config.MessageStoreConfig
    // CommitLog file size, default is 1G
    private int mapedFileSizeCommitLog = 1024 * 1024 * 1024;
    // ConsumeQueue file size, default is 30W, 大小有 6M
    private int mapedFileSizeConsumeQueue = 300000 * ConsumeQueue.CQ_STORE_UNIT_SIZE;

    从这个方法可以清晰地看出 commitLog 的存储格式

    // org.apache.rocketmq.store.CommitLog#calMsgLength
    private static int calMsgLength(int bodyLength, int topicLength, int propertiesLength) {
        final int msgLen = 4 //TOTALSIZE
            + 4 //MAGICCODE
            + 4 //BODYCRC
            + 4 //QUEUEID
            + 4 //FLAG
            + 8 //QUEUEOFFSET
            + 8 //PHYSICALOFFSET
            + 4 //SYSFLAG
            + 8 //BORNTIMESTAMP
            + 8 //BORNHOST
            + 8 //STORETIMESTAMP
            + 8 //STOREHOSTADDRESS
            + 4 //RECONSUMETIMES
            + 8 //Prepared Transaction Offset
            + 4 + (bodyLength > 0 ? bodyLength : 0) //BODY
            + 1 + topicLength //TOPIC
            + 2 + (propertiesLength > 0 ? propertiesLength : 0) //propertiesLength
            + 0;
        return msgLen;
    }

    当使用分区 offset 拉取消息时,consumeQueue 类似于 index,一个 entry 20 字节,包括 commitLog offset,消息 size,tag 的 hashcode,对于延时消息,tag 字段存的是超时时间。

    boolean result = this.putMessagePositionInfo(request.getCommitLogOffset(), request.getMsgSize(), tagsCode, request.getConsumeQueueOffset());
    
    // org.apache.rocketmq.store.ConsumeQueue#putMessagePositionInfo
    private boolean putMessagePositionInfo(final long offset, final int size, final long tagsCode, final long cqOffset) {
        if (offset <= this.maxPhysicOffset) {
            return true;
        }
    
        this.byteBufferIndex.flip();
        this.byteBufferIndex.limit(CQ_STORE_UNIT_SIZE);
        // 8 + 4 + 8 = 20
        this.byteBufferIndex.putLong(offset); // commitLog 的物理位置
        this.byteBufferIndex.putInt(size); // 消息大小
        this.byteBufferIndex.putLong(tagsCode); // 8 字节 tag 哈希值
    
        ...
    }

     broker 为消息的 UNIQ_KEY 和 topic + "#" + key 建立索引,index 文件的结构本质上是一个 hashmap 

    // org.apache.rocketmq.store.index.IndexFile
    // 40 + 5000000*4 + 20000000*20
    int fileTotalSize = IndexHeader.INDEX_HEADER_SIZE + (hashSlotNum * hashSlotSize) + (indexNum * indexSize);
    // 一个索引文件大概 420M, 写满了则创建新文件

    索引文件就是一个 hashmap,根据 key 查询消息时,遍历所有的 indexFile

    文件结构:

    文件头
    哈希槽
    数据部分

    // org.apache.rocketmq.store.index.IndexFile#putKey
    // 数据 entry 的大小为 20 字节:keyHash, phyOffset, timeDiff, slotValue
    this.mappedByteBuffer.putInt(absIndexPos, keyHash);
    this.mappedByteBuffer.putLong(absIndexPos + 4, phyOffset);
    this.mappedByteBuffer.putInt(absIndexPos + 4 + 8, (int) timeDiff);
    // 这里的 slotValue 是上一条索引的编号
    this.mappedByteBuffer.putInt(absIndexPos + 4 + 8 + 4, slotValue);
    // 当前索引的编号写到哈希槽
    this.mappedByteBuffer.putInt(absSlotPos, this.indexHeader.getIndexCount());

    rocketMQ 写完 commitLog 后,写 consumeQueue 和 indexFile 是一个异步的过程,在

    org.apache.rocketmq.store.DefaultMessageStore.ReputMessageService#doReput

    中触发

    // org.apache.rocketmq.store.DefaultMessageStore#DefaultMessageStore
    this.dispatcherList = new LinkedList<>();
    this.dispatcherList.addLast(new CommitLogDispatcherBuildConsumeQueue());
    this.dispatcherList.addLast(new CommitLogDispatcherBuildIndex());
    // org.apache.rocketmq.store.DefaultMessageStore#doDispatch
    public void doDispatch(DispatchRequest req) {
        for (CommitLogDispatcher dispatcher : this.dispatcherList) {
            dispatcher.dispatch(req);
        }
    }
  • 相关阅读:
    2017暑期集训Day 4
    2017暑期集训Day 5
    2017暑期集训Day 3
    Codeforces Round #433
    校内集训(20170906)
    校内集训(20170903)
    培训补坑(day10:双指针扫描+矩阵快速幂)
    培训补坑(day8:树上倍增+树链剖分)
    培训补坑(day7:线段树的区间修改与运用)(day6是测试,测试题解以后补坑QAQ)
    培训补坑(day5:最小生成树+负环判断+差分约束)
  • 原文地址:https://www.cnblogs.com/allenwas3/p/11505242.html
Copyright © 2020-2023  润新知