• Castled 源码解析 container 模块说明


    container 属于Castled api 后端服务,后端包含了任务调度,db 迁移,有几个服务是比较重要的
    主要是pipelineservice,ExternalAppService,WarehouseService,而且官方还提供了一套基于events 的处理
    主要包含PipelineEvent,CastledEvent,其他的主要是基于dropwizard 开发的rest api 了,整体代码并不难
    pipelineservice 在其中比较核心,进行了app 与connector 的关联操作,pipelineservice 会使用到event ,task 处理
    PipelineExecutor 对于数据的处理主要是在此task 执行的(时间数据拉取以及发送处理就是在这里边的)

    pipelineservice核心方法

    参考下图

    PipelineExecutor 处理

     
     public String executeTask(Task task) {
            Long pipelineId = ((Number) task.getParams().get(CommonConstants.PIPELINE_ID)).longValue();
            Pipeline pipeline = this.pipelineService.getActivePipeline(pipelineId);
            if (pipeline == null) {
                return null;
            }
            // 挺重要的,进行状态统计的
            WarehouseSyncFailureListener warehouseSyncFailureListener = null;
            Warehouse warehouse = this.warehouseService.getWarehouse(pipeline.getWarehouseId());
            PipelineRun pipelineRun = getOrCreatePipelineRun(pipelineId);
            WarehousePollContext warehousePollContext = WarehousePollContext.builder()
                    .primaryKeys(PipelineUtils.getWarehousePrimaryKeys(pipeline)).pipelineUUID(pipeline.getUuid())
                    .pipelineRunId(pipelineRun.getId()).warehouseConfig(warehouse.getConfig())
                    .dataEncryptionKey(encryptionManager.getEncryptionKey(warehouse.getTeamId()))
                    .queryMode(pipeline.getQueryMode())
                    .query(pipeline.getSourceQuery()).pipelineId(pipeline.getId()).build();
            try {
              // 调用warehouse connector,获取数据
                WarehouseExecutionContext warehouseExecutionContext = pollRecords(warehouse, pipelineRun, warehousePollContext);
     
                log.info("Poll records completed for pipeline {}", pipeline.getName());
                this.pipelineService.updatePipelineRunstage(pipelineRun.getId(), PipelineRunStage.RECORDS_POLLED);
     
                ExternalApp externalApp = externalAppService.getExternalApp(pipeline.getAppId());
                ExternalAppConnector externalAppConnector = this.externalAppConnectors.get(externalApp.getType());
                RecordSchema appSchema = externalAppConnector.getSchema(externalApp.getConfig(), pipeline.getAppSyncConfig())
                        .getAppSchema();
     
                log.info("App schema fetch completed for pipeline {}", pipeline.getName());
     
                warehousePollContext.setWarehouseSchema(warehouseExecutionContext.getWarehouseSchema());
                warehouseSyncFailureListener = warehouseConnectors.get(warehouse.getType())
                        .syncFailureListener(warehousePollContext);
     
                MysqlErrorTracker mysqlErrorTracker = new MysqlErrorTracker(warehousePollContext);
     
                ErrorOutputStream schemaMappingErrorOutputStream = new ErrorOutputStream(warehouseSyncFailureListener, mysqlErrorTracker);
     
                SchemaMappedMessageInputStream schemaMappedMessageInputStream = new SchemaMappedMessageInputStream(
                        appSchema, warehouseExecutionContext.getMessageInputStreamImpl(), pipeline.getDataMapping().appWarehouseMapping(),
                        pipeline.getDataMapping().warehouseAppMapping(), schemaMappingErrorOutputStream);
     
                SchemaMappedRecordOutputStream schemaMappedRecordOutputStream =
                        new SchemaMappedRecordOutputStream(SchemaUtils.filterSchema(warehousePollContext.getWarehouseSchema(),
                                PipelineUtils.getWarehousePrimaryKeys(pipeline)), warehouseSyncFailureListener,
                                pipeline.getDataMapping().warehouseAppMapping());
     
                ErrorOutputStream sinkErrorOutputStream = new ErrorOutputStream(schemaMappedRecordOutputStream,
                        new SchemaMappedErrorTracker(mysqlErrorTracker, warehouseExecutionContext.getWarehouseSchema(), pipeline.getDataMapping().warehouseAppMapping()));
     
                log.info("App Sync started for pipeline {}", pipeline.getName());
     
                List<String> mappedAppFields = pipeline.getDataMapping().getFieldMappings().stream().filter(mapping -> !mapping.isSkipped())
                        .map(FieldMapping::getAppField).collect(Collectors.toList());
     
                DataSinkRequest dataSinkRequest = DataSinkRequest.builder().externalApp(externalApp).errorOutputStream(sinkErrorOutputStream)
                        .appSyncConfig(pipeline.getAppSyncConfig()).mappedFields(mappedAppFields)
                        .objectSchema(appSchema).primaryKeys(pipeline.getDataMapping().getPrimaryKeys())
                        .messageInputStream(schemaMappedMessageInputStream)
                        .build();
     
               //  进行数据同步使用了MonitoredDataSink 对象,实现了一些统计信息
                PipelineSyncStats pipelineSyncStats = monitoredDataSink.syncRecords(externalAppConnector.getDataSink(),
                        pipelineRun.getPipelineSyncStats(), pipelineRun.getId(), dataSinkRequest);
     
                schemaMappedMessageInputStream.close();
     
                log.info("App Sync completed for pipeline {}", pipeline.getName());
                //flush output streams
                schemaMappingErrorOutputStream.flushFailedRecords();
                sinkErrorOutputStream.flushFailedRecords();
     
                warehouseConnectors.get(warehouse.getType()).getDataPoller().cleanupPipelineRunResources(warehousePollContext);
                // Also add the records that failed schema mapping phase to the final stats
                pipelineSyncStats.setRecordsFailed(schemaMappedMessageInputStream.getFailedRecords() + pipelineSyncStats.getRecordsFailed());
                this.pipelineService.markPipelineRunProcessed(pipelineRun.getId(), pipelineSyncStats);
     
            } catch (Exception e) {
                if (ObjectRegistry.getInstance(AppShutdownHandler.class).isShutdownTriggered()) {
                    throw new PipelineInterruptedException();
                }
                this.pipelineService.markPipelineRunFailed(pipelineRun.getId(), Optional.ofNullable(e.getMessage()).orElse("Unknown Error"));
                log.error("Pipeline run failed for pipeline {} ", pipeline.getId(), e);
                this.warehouseConnectors.get(warehouse.getType()).getDataPoller().cleanupPipelineRunResources(warehousePollContext);
                Optional.ofNullable(warehouseSyncFailureListener).ifPresent(syncFailureListener ->
                        syncFailureListener.cleanupResources(pipeline.getUuid(), pipelineRun.getId(), warehouse.getConfig()));
     
                if (e instanceof PipelineExecutionException) {
                    handlePipelineExecutionException(pipeline, (PipelineExecutionException) e);
                } else {
                    log.error("Pipeline run failed for pipeline {} ", pipeline.getId(), e);
                }
            }
            return null;
        }

    说明

    目前从代码中可以看到每个创建的任务会发送消息到Castled的统计服务中,如果不需要的话,最好处理下,目前看配置定义,暂时没有开关可以禁用
    尽管系统使用了kafka,但是感觉kafaka 的使用并不是很明显(更多是一个任务排队的处理),并不是基于kafka 的消息发送处理

    参考资料

    https://github.com/castledio/castled

  • 相关阅读:
    scp 一个最简单的Linux 数据copy
    ORA-65096: invalid common user or role 解决方法
    SQL Server 查询 数据库 & 表格 大小
    SQL Server 配置 Job 监控 tempdb 变化
    SQL Server 邮箱告警配置
    浅谈 SQL Server 中的等待类型(Wait Type)
    Oracle 常用命令大全(持续更新)
    连接Oracle 12c R2 报错ORA-28040:No matching authentication protocal
    Oracle 数据库启动和关闭
    SQL Server 日志收缩方法
  • 原文地址:https://www.cnblogs.com/rongfengliang/p/15859023.html
Copyright © 2020-2023  润新知