• 【源码】Flink StreamingFileSink 处理流程


    前两天试了下 Flink SQL 写 Hive,对 Sink 部分写数据到 HDFS 的部分比较疑惑,特别是基于 checkpoint 的文件提交,所以看了下 StreamingFileSink 的源码(Flink SQL 写 hive 复用了这部分代码)

    StreamingFileSink 是 1.6 版本社区优化后推出的,为了替换 BucketingSink,BucketingSink 在 Flink 1.9 版本已经标记为 过期的

    StreamingFileSink 的处理流程包含四级结构,依次往下调用:

    * StreamingFIleSink 是 Sink,定义算子
    * StreamingFileSinkHelper 是 sink 处理数据的工具类
    * Buckets 对应所有桶
    * Bucket 对应某一个桶

    简单的 Demo

    object StreamingFileDemo {
    
      def main(args: Array[String]): Unit = {
        // environment
        val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
        //    env.enableCheckpointing(TimeUnit.MINUTES.toMillis(10))
        env.setParallelism(2)
    
        val source = env.addSource(new SimpleStringSource)
    
        val sink: StreamingFileSink[String] = StreamingFileSink
          // row-encoding
          .forRowFormat(new Path("/out/tmp"), new SimpleStringEncoder[String]("UTF-8"))
          // 桶分配器
          .withBucketAssigner(new DateTimeBucketAssigner[String]("yyyyMMdd", ZoneId.of("Asia/Shanghai")))
          // 方便hive 表直接加载对应目录作为hive 分区
          //.withBucketAssigner(new DateTimeBucketAssigner[String]("'dt='yyyyMMdd", ZoneId.of("Asia/ShangHai")))
          // 滚动策略
          .withRollingPolicy(
          DefaultRollingPolicy.builder()
            // 2 minute 滚动一次
            .withRolloverInterval(TimeUnit.MINUTES.toMillis(2))
            // 2 minute 不活跃时间
            .withInactivityInterval(TimeUnit.MINUTES.toMillis(2))
            // 10mb
            .withMaxPartSize(10 * 1024 * 1024)
            .build())
          .build()
    
        source.addSink(sink)
        env.execute("StreamingFileDemo")
    
      }
    }

    写数据部分

    跳过算子处理输入数据的通用逻辑,直接进入 StreamingFileSink

    StreamingFileSink 初始化的时候调用 StreamingFileSink.initializeState 创建 Buckets 对象,作为参数传入 StreamingFileSinkHelper 构造方法中,创建 StreamingFileSinkHelper 对象

    public void initializeState(FunctionInitializationContext context) throws Exception {
        this.helper = new StreamingFileSinkHelper<>(
            bucketsBuilder.createBuckets(getRuntimeContext().getIndexOfThisSubtask()),
            context.isRestored(),
            context.getOperatorStateStore(),
            ((StreamingRuntimeContext) getRuntimeContext()).getProcessingTimeService(),
            bucketCheckInterval);
      }

    StreamingFileSink 是在 invoke 中处理输入数据

    StreamingFileSink.invoke

    @Override
      public void invoke(IN value, SinkFunction.Context context) throws Exception {
        this.helper.onElement(
          value,
          context.currentProcessingTime(),
          context.timestamp(),
          context.currentWatermark());
      }

    从代码中可以看到,invoke 直接调用 StreamingFileSinkHelper.onElement 方法处理数据,同时传入 输入数据、当前处理时间、timestamp(数据时间,数据没有时间则为 null)、当前水印

    StreamingFileSinkHelper

    // 处理时间触发
    @Override
    public void onProcessingTime(long timestamp) throws Exception {
      final long currentTime = procTimeService.getCurrentProcessingTime();
      buckets.onProcessingTime(currentTime);
      procTimeService.registerTimer(currentTime + bucketCheckInterval, this);
    }
    // 输入数据触发
    public void onElement(
        IN value,
        long currentProcessingTime,
        @Nullable Long elementTimestamp,
        long currentWatermark) throws Exception {
      buckets.onElement(value, currentProcessingTime, elementTimestamp, currentWatermark);
    }

    从代码可以看到, StreamingFileSinkHelper.onElement 方法也是直接调用 buckets.onElement 处理数据的,并且实现了 ProcessingTimeCallback 接口,使用定时器根据 bucketCheckInterval 定时调用 buckets.onProcessingTime (滚动 bucket ,可以看到是不支持事件时间滚动 bucket 的)

    Buckets

    @VisibleForTesting
      public Bucket<IN, BucketID> onElement(
          final IN value,
          final SinkFunction.Context context) throws Exception {
        return onElement(
            value,
            context.currentProcessingTime(),
            context.timestamp(),
            context.currentWatermark());
      }
    
      public Bucket<IN, BucketID> onElement(
          final IN value,
          final long currentProcessingTime,
          @Nullable final Long elementTimestamp,
          final long currentWatermark) throws Exception {
        // setting the values in the bucketer context
        bucketerContext.update(
            elementTimestamp,
            currentWatermark,
            currentProcessingTime);
        // 获取 bucket id
        final BucketID bucketId = bucketAssigner.getBucketId(value, bucketerContext);
        // 根据 bucketId get 或者创建 bucket
        final Bucket<IN, BucketID> bucket = getOrCreateBucketForBucketId(bucketId);
        // 写数据到 bucket
        bucket.write(value, currentProcessingTime);
    
        // we update the global max counter here because as buckets become inactive and
        // get removed from the list of active buckets, at the time when we want to create
        // another part file for the bucket, if we start from 0 we may overwrite previous parts.
        // part file 计数
        this.maxPartCounter = Math.max(maxPartCounter, bucket.getPartCounter());
        return bucket;
      }
    
      private Bucket<IN, BucketID> getOrCreateBucketForBucketId(final BucketID bucketId) throws IOException {
        // 从活跃 bucket 中取,null 就创建一个
        Bucket<IN, BucketID> bucket = activeBuckets.get(bucketId);
        if (bucket == null) {
          final Path bucketPath = assembleBucketPath(bucketId);
          bucket = bucketFactory.getNewBucket(
              subtaskIndex,
              bucketId,
              bucketPath,
              maxPartCounter,
              bucketWriter,
              rollingPolicy,
              outputFileConfig);
          activeBuckets.put(bucketId, bucket);
          notifyBucketCreate(bucket);
        }
        return bucket;
      }
      // 处理时间定时器调用
      public void onProcessingTime(long timestamp) throws Exception {
        // 调用活跃 bucket 的 onProcessTime 方法
        for (Bucket<IN, BucketID> bucket : activeBuckets.values()) {
          bucket.onProcessingTime(timestamp);
        }
      }

    这里可以看到 数据数据进来,获取 bucketId, 获取 bucket 对象,并调用 bucket.write 写数据 (终于看到 write 数据了)

    // 写数据
    void write(IN element, long currentTime) throws IOException {
        // 根据 part 文件的 数据量判断是否需要滚动文件,对应 主程序设置的 10M
        if (inProgressPart == null || rollingPolicy.shouldRollOnEvent(inProgressPart, element)) {
    
          if (LOG.isDebugEnabled()) {
            LOG.debug("Subtask {} closing in-progress part file for bucket id={} due to element {}.",
                subtaskIndex, bucketId, element);
          }
          // 滚动文件
          rollPartFile(currentTime);
        }
        // 写数据到 inprocess part 
        inProgressPart.write(element, currentTime);
      }
      // 滚动 part 文件
      private void rollPartFile(final long currentTime) throws IOException {
        // 关闭当前 part 文件
        closePartFile();
        // 创建新的 part 文件 Path
        final Path partFilePath = assembleNewPartPath();
        // 打开 part Writer
        inProgressPart = bucketWriter.openNewInProgressFile(bucketId, partFilePath, currentTime);
    
        if (LOG.isDebugEnabled()) {
          LOG.debug("Subtask {} opening new part file "{}" for bucket id={}.",
              subtaskIndex, partFilePath.getName(), bucketId);
        }
        // part count ++
        partCounter++;
      }
    
      private Path assembleNewPartPath() {
        return new Path(bucketPath, outputFileConfig.getPartPrefix() + '-' + subtaskIndex + '-' + partCounter + outputFileConfig.getPartSuffix());
      }
      // 关闭 part 文件
      private InProgressFileWriter.PendingFileRecoverable closePartFile() throws IOException {
        InProgressFileWriter.PendingFileRecoverable pendingFileRecoverable = null;
        if (inProgressPart != null) {
          // close 文件
          pendingFileRecoverable = inProgressPart.closeForCommit();
          // 添加到 待 提交 列表
          pendingFileRecoverablesForCurrentCheckpoint.add(pendingFileRecoverable);
          inProgressPart = null;
        }
        return pendingFileRecoverable;
      
      // 处理时间 定时器调用,根据处理时间 滚动 part 文件
      void onProcessingTime(long timestamp) throws IOException {
        if (inProgressPart != null && rollingPolicy.shouldRollOnProcessingTime(inProgressPart, timestamp)) {
          if (LOG.isDebugEnabled()) {
            LOG.debug("Subtask {} closing in-progress part file for bucket id={} due to processing time rolling policy " +
                "(in-progress file created @ {}, last updated @ {} and current time is {}).",
                subtaskIndex, bucketId, inProgressPart.getCreationTime(), inProgressPart.getLastUpdateTime(), timestamp);
          }
          closePartFile();
        }
      

    这里就是滚动文件和写数据和代码实现,真正的往文件系统写数据会根据执行环境的不同,调用 RowWiseBucketWriter 根据传入路径创建不同的 RecoverableWriter

    @Internal
            @Override
            public Buckets<IN, BucketID> createBuckets(int subtaskIndex) throws IOException {
                return new Buckets<>(
                        basePath,
                        bucketAssigner,
                        bucketFactory,
                        new RowWiseBucketWriter<>(FileSystem.get(basePath.toUri()).createRecoverableWriter(), encoder),
                        rollingPolicy,
                        subtaskIndex,
                        outputFileConfig);
            }

    Checkpoint snapshotState 部分

    引用官网一段
    ```
    IMPORTANT: Checkpointing needs to be enabled when using the StreamingFileSink. Part files can only be finalized on successful checkpoints.

    If checkpointing is disabled part files will forever stay in `in-progress` or `pending` state and cannot be safely read by downstream systems.

    在使用 StreamingFileSink 的时候,需要启动 checkpoint, Part 文件只有在 checkpoint 成功才能最终确定。

    如果未启用 checkpoint part 文件会永远保持在 'in-progress' 或 'pending' 状态,下游系统不能安全的读取。

    ```

    简单来说就是,必须开启 checkpoint,不然文件不能转换成完成状态(数据是完整的,文件状态不完成),所以 checkpoint 是必须的

    checkpoint 过程, 从收到 CheckpointCoordinator 的 snapshot 消息开始,跳过 StreamingFileSink/StreamingFileSinkHelper 直接看 Buckets.snapshotState 部分

    public void snapshotState(
          final long checkpointId,
          final ListState<byte[]> bucketStatesContainer,
          final ListState<Long> partCounterStateContainer) throws Exception {
    
        Preconditions.checkState(
          bucketWriter != null && bucketStateSerializer != null,
            "sink has not been initialized");
    
        LOG.info("Subtask {} checkpointing for checkpoint with id={} (max part counter={}).",
            subtaskIndex, checkpointId, maxPartCounter);
    
        bucketStatesContainer.clear();
        partCounterStateContainer.clear();
    
        snapshotActiveBuckets(checkpointId, bucketStatesContainer);
        partCounterStateContainer.add(maxPartCounter);
      }
    
      private void snapshotActiveBuckets(
          final long checkpointId,
          final ListState<byte[]> bucketStatesContainer) throws Exception {
        // 获取 活跃 bucket 的 status,放入 bucketStatesContainer
        for (Bucket<IN, BucketID> bucket : activeBuckets.values()) {
          final BucketState<BucketID> bucketState = bucket.onReceptionOfCheckpoint(checkpointId);
    
          final byte[] serializedBucketState = SimpleVersionedSerialization
              .writeVersionAndSerialize(bucketStateSerializer, bucketState);
    
          bucketStatesContainer.add(serializedBucketState);
    
          if (LOG.isDebugEnabled()) {
            LOG.debug("Subtask {} checkpointing: {}", subtaskIndex, bucketState);
          }
        }
      }

    Bucket

    BucketState<BucketID> onReceptionOfCheckpoint(long checkpointId) throws IOException {
        prepareBucketForCheckpointing(checkpointId);
    
        InProgressFileWriter.InProgressFileRecoverable inProgressFileRecoverable = null;
        long inProgressFileCreationTime = Long.MAX_VALUE;
    
        if (inProgressPart != null) {
          // 会 close 文件
          inProgressFileRecoverable = inProgressPart.persist();
          inProgressFileCreationTime = inProgressPart.getCreationTime();
          this.inProgressFileRecoverablesPerCheckpoint.put(checkpointId, inProgressFileRecoverable);
        }
    
        return new BucketState<>(bucketId, bucketPath, inProgressFileCreationTime, inProgressFileRecoverable, pendingFileRecoverablesPerCheckpoint);
      }
    
      private void prepareBucketForCheckpointing(long checkpointId) throws IOException {
        // 判断时间根据 checkpoint 滚动文件
        if (inProgressPart != null && rollingPolicy.shouldRollOnCheckpoint(inProgressPart)) {
          if (LOG.isDebugEnabled()) {
            LOG.debug("Subtask {} closing in-progress part file for bucket id={} on checkpoint.", subtaskIndex, bucketId);
          }
          closePartFile();
        }
    
        if (!pendingFileRecoverablesForCurrentCheckpoint.isEmpty()) {
          // 添加到 pendingFileRecoverablesPerCheckpoint 中
          pendingFileRecoverablesPerCheckpoint.put(checkpointId, pendingFileRecoverablesForCurrentCheckpoint);
          pendingFileRecoverablesForCurrentCheckpoint = new ArrayList<>();
        }
      }

    将活跃桶的状态放入 bucketStatesContainer(ListState) 中

    Checkpoint onCheckpointComplete

    收到 收到 CheckpointCoordinator 的 complete 消息,跳过 StreamingFileSink/StreamingFileSinkHelper 直接看 Buckets.commitUpToCheckpoint 部分

      public void commitUpToCheckpoint(final long checkpointId) throws IOException {
        final Iterator<Map.Entry<BucketID, Bucket<IN, BucketID>>> activeBucketIt =
            activeBuckets.entrySet().iterator();
    
        LOG.info("Subtask {} received completion notification for checkpoint with id={}.", subtaskIndex, checkpointId);
        // 对 所有活跃 bucket 调用 bucket.onSuccessfulCompletionOfCheckpoint
        while (activeBucketIt.hasNext()) {
          final Bucket<IN, BucketID> bucket = activeBucketIt.next().getValue();
          bucket.onSuccessfulCompletionOfCheckpoint(checkpointId);
    
          if (!bucket.isActive()) {
            // We've dealt with all the pending files and the writer for this bucket is not currently open.
            // Therefore this bucket is currently inactive and we can remove it from our state.
            activeBucketIt.remove();
            notifyBucketInactive(bucket);
          }
        }
      }

    Bucket

    void onSuccessfulCompletionOfCheckpoint(long checkpointId) throws IOException {
        checkNotNull(bucketWriter);
    
        Iterator<Map.Entry<Long, List<InProgressFileWriter.PendingFileRecoverable>>> it =
            pendingFileRecoverablesPerCheckpoint.headMap(checkpointId, true)
                .entrySet().iterator();
    
        while (it.hasNext()) {
          Map.Entry<Long, List<InProgressFileWriter.PendingFileRecoverable>> entry = it.next();
          // 遍历所有 pendingFileRecoverablesPerCheckpoint 中的 bucket
          for (InProgressFileWriter.PendingFileRecoverable pendingFileRecoverable : entry.getValue()) {
            // 恢复 pending 文件,提交,文件名会修改成 完成状态格式 (去掉 pending/in-process)
            bucketWriter.recoverPendingFile(pendingFileRecoverable).commit();
          }
          it.remove();
        }
        // 清理 当前 checkpoint
        cleanupInProgressFileRecoverables(checkpointId);
      }
    
      private void cleanupInProgressFileRecoverables(long checkpointId) throws IOException {
        Iterator<Map.Entry<Long, InProgressFileWriter.InProgressFileRecoverable>> it =
            inProgressFileRecoverablesPerCheckpoint.headMap(checkpointId, false)
                .entrySet().iterator();
    
        while (it.hasNext()) {
          final InProgressFileWriter.InProgressFileRecoverable inProgressFileRecoverable = it.next().getValue();
    
          // this check is redundant, as we only put entries in the inProgressFileRecoverablesPerCheckpoint map
          // list when the requiresCleanupOfInProgressFileRecoverableState() returns true, but having it makes
          // the code more readable.
    
          final boolean successfullyDeleted = bucketWriter.cleanupInProgressFileRecoverable(inProgressFileRecoverable);
          if (LOG.isDebugEnabled() && successfullyDeleted) {
            LOG.debug("Subtask {} successfully deleted incomplete part for bucket id={}.", subtaskIndex, bucketId);
          }
          it.remove();
        }
      }

    bucketWriter.recoverPendingFile(pendingFileRecoverable).commit() 在不同的环境下有不同的表现:

    本地: RowWiseBucketWriter    --> LocalCommitter  --> LocalCommitter.commit   本地直接修改文件名

    HDFS: RowWiseBucketWriter  -->  RecoverableFsDataOutputStream.Committer --> HadoopFsCommitter.commit fs.rename 文件名

    StreamingFlinkSink 处于一致性考虑在设计的时候就定义了必须开启 checkpoint ,但是有的任务其实是不太在意一致性的,可以接受部分数据的丢失,最重要的是不想开启 checkpoint(比如实时检测场景,只关心最近很短时间内的数据,如果任务失败了,直接拉起重新开始,而不是回到上次 checkpoint 时间点 )

    直接在 Bucket.closePartFile 中提交文件,并且不放入 checkpoint 中,就可以在 滚动 part 文件的时候,提交文件

    bucketWriter.recoverPendingFile(pendingFileRecoverable).commit();
    // not add to checkpoint
    // pendingFileRecoverablesForCurrentCheckpoint.add(pendingFileRecoverable);

    欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文

         

  • 相关阅读:
    tensorflow搭建神经网络基本流程
    为什么Linux 实例执行 df 和 du 查看磁盘时结果不一致
    MySQL高可用架构之MHA
    Postgresql 用户管理
    PXE+Kickstart无人值守安装CentOS 7
    linux shell实用常用命令
    常用压缩命令
    mongodb 备份还原
    mongodb 备份脚本
    mongodb 日志清理
  • 原文地址:https://www.cnblogs.com/Springmoon-venn/p/13847868.html
Copyright © 2020-2023  润新知