• streamsets Processors 说明


    Processors 表示对于一种数据操作处理,在pipeline中可以应用多个Processors,
    同时根据不同的执行模式,可以分为独立模式的,集群模式、边缘模式(agent),以及
    帮助测试的测试Processors

    独立pipelineonly

    • Record Deduplicator - Removes duplicate records.

    独立&&集群pipeline

    • Aggregator - Performs aggregations and displays the results in Monitor mode and writes the results to events when enabled. This processor does not update the records being evaluated.
    • Base64 Field Decoder - Decodes Base64 encoded data to binary data.
    • Base64 Field Encoder - Encodes binary data using Base64.
    • Data Parser - Parses NetFlow or syslog data embedded in a field.
    • Delay - Delays passing a batch to the rest of the pipeline.
    • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
    • Field Flattener - Flattens nested fields.
    • Field Hasher - Uses an algorithm to encode sensitive data.
    • Field Masker - Masks sensitive string data.
    • Field Merger - Merges fields in complex lists or maps.
    • Field Order - Orders fields in a map or list-map root field type and outputs the fields into a list-map or list root field type.
    • Field Pivoter - Pivots data in a list, map, or list-map field and creates a record for each item in the field.
    • Field Remover - Removes fields from a record.
    • Field Renamer - Renames fields in a record.
    • Field Replacer - Replaces field values.
    • Field Splitter - Splits the string values in a field into different fields.
    • Field Type Converter - Converts the data types of fields.
    • Field Zip - Merges list data from two fields.
    • Geo IP- Returns geolocation and IP intelligence information for a specified IP address.
    • Groovy Evaluator - Processes records based on custom Groovy code.
    • HBase Lookup - Performs key-value lookups in HBase to enrich records with data.
    • Hive Metadata - Works with the Hive Metastore destination as part of the Drift Synchronization Solution for Hive.
    • HTTP Client - The HTTP Client processor sends requests to an HTTP resource URL and writes the results to a field.
    • JavaScript Evaluator - Processes records based on custom JavaScript code.
    • JDBC Lookup - Performs lookups in a database table through a JDBC connection.
    • JDBC Tee - Writes data to a database table through a JDBC connection, and enriches records with data from generated database columns.
    • JSON Generator - Serializes data from a field to a JSON-encoded string.
    • JSON Parser - Parses a JSON object embedded in a string field.
    • Jython Evaluator - Processes records based on custom Jython code.
    • Kudu Lookup - Performs lookups in Kudu to enrich records with data.
    • Log Parser - Parses log data in a field based on the specified log format.
    • PostgreSQL Metadata - Tracks structural changes in source data then creates and alters PostgreSQL tables as part of the Drift Synchronization Solution for PostgreSQL.
    • Redis Lookup - Performs key-value lookups in Redis to enrich records with data.
    • Salesforce Lookup - Performs lookups in Salesforce to enrich records with data.
    • Schema Generator - Generates a schema for each record and writes the schema to a record header attribute.
    • Spark Evaluator - Processes data based on a custom Spark application.
    • SQL Parser - Parses SQL queries in a string field.
    • Static Lookup - Performs key-value lookups in local memory.
    • Stream Selector - Routes data to different streams based on conditions.
    • Value Replacer (Deprecated) - Replaces existing nulls or specified values with constants or nulls.
    • Whole File Transformer - Transforms Avro files to Parquet.
    • XML Flattener - Flattens XML data in a string field.
    • XML Parser - Parses XML data in a string field.

    边缘pipeline

    • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
    • Field Remover - Removes fields from a record.
    • JavaScript Evaluator - Processes records based on custom JavaScript code.
    • Stream Selector - Routes data to different streams based on conditions.

    测试Processors

    • Dev Identity
    • Dev Random Error
    • Dev Record Creator

    参考资料

    https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Processors/Processors_overview.html#concept_hpr_twm_jq

  • 相关阅读:
    L1-046. 整除光棍
    L2-014. 列车调度
    L2-009. 抢红包
    L2-005. 集合相似度
    L2-021. 点赞狂魔
    L1-033. 出生年
    设计模式之生成器模式
    设计模式之抽象工厂模式
    设计模式之工厂方法模式
    设计模式之简单工厂模式
  • 原文地址:https://www.cnblogs.com/rongfengliang/p/9509455.html
Copyright © 2020-2023  润新知