• SSIS ->> Control Flow And Data Flow


    In the Control Flow, the task is the smallest unit of work, and a task requires completion (success, failure, or just completion) before subsequent tasks are handled. 

    • Workflow orchestration
    • Process-oriented
    • Serial or parallel tasks execution
    • Synchronous processing 

    In the Data Flow, the transformation and the adapter are the basic components;

    • Information-oriented
    • Data correlation and transformation
    • Coordinated processing
    • Streaming in nature
    • Source extraction and destination loading

    Multiple components are running at the same time because the Data Flow Transformations are working together in a coordinated streaming fashion, and the data is being transformed in groups (called buffers) as it is passed down from the source to the subsequent transformations.

    • Data buffer architecture
    • Transformation types
    • Transformation communication
    • Execution tree

    Instead of data being passed down through the transformations, groups of transformations pass over the buffers of data and make in-place changes as defi ned by the transformations.

    Blocking nature: Non-blocking (sometimes called streaming), semi-blocking, blocking Communication mechanism: Synchronous and asynchronous 

    All transformations fall into one of three categories: non-blocking, semi-blocking, or blocking. These terms describe whether data in a transformation is passed downstream in the pipeline immediately, in increments, or after all the data is fully received.

    Non-Blocking, Streaming, and Row-Based Transformations 

    Most of the SSIS transformations are non-blocking. This means that the transformation logic applied in the transformation does not impede the data from moving on to the next transformation after the transformation logic is applied to the row. Two categories of non-blocking transformations exist: streaming and row-based. The difference is whether the SSIS transformation can use internal information and processes to handle its work or whether the transformation has to call an external process to retrieve information it needs for the work. Some transformations can be categorized as streaming or row-based depending on their configuration, which are indicated in the list below. 

    Streaming transformations are usually able to apply transformation logic quickly, using precached data and processing calculations within the row being worked on. In these transformations, it is usually the case that a transformation will not slip behind the rate of the data being fed to it. These transformations focus their resources on the CPUs, which in most cases are not the bottleneck of an ETL system.

    Audit

    Cache

    Transform

    Character Map

    Conditional Split

    Copy Column

    Data Conversion

    Derived Column

    Lookup (with a full-cache setting)

    Multicast

    Percent Sampling

    Row Count

    Script Component (provided the script is not confi gured with an asynchronous output)

    Union All (the Union All acts like a streaming transformation but is actually a semi- blocking transformation because it communicates asynchronously) 

    Row-based:

    DQS Cleansing

    Export Column

    Import Column

    Lookup (with a no-cache or partial-cache setting)

    OLE DB Command Script Component (where the script interacts with an external component)

    Slowly Changing Dimension (each row is looked up against the dimension in the database)

    Semi-Blocking Transformations  are the ones that hold up records in the Data Flow for a period of time before allowing the memory buffers to be passed downstream.

    Data Mining Query

    Merge

    Merge Join

    Pivot

    Term Lookup

    Unpivot

    Union All (also included in the streaming transformations list, but under the covers, the Union All is semi-blocking) 

    SSIS 2012 can throttle the sources by limiting the requests from the upstream transformations and sources, thereby preventing SSIS from getting into an out-of-memory situation.

    Blocking Transformations 

    These components require a complete review of the upstream data before releasing any row downstream to the connected transformations and destinations.

    Aggregate

    Fuzzy Grouping

    Fuzzy Lookup

    Row Sampling

    Sort

    Term Extraction

    Script Component (when confi gured to receive all rows before sending any downstream) 

    Synchronous and Asynchronous Transformation Outputs 

    synchronous and asynchronous refer more to the relationship between the Input and Output Component connections and buffers. 

    A transformation output is asynchronous if the buffers used in the input are different from the buffers used in the output. In other words, many of the transformations cannot both perform the specifi ed operation and preserve the buffers (the number of rows or the order of the rows), so a copy of the data must be made to accomplish the desired effect. 

    All the semi-blocking and blocking transformations already listed have asynchronous outputs by defi nition — none of them can pass input buffers on downstream because the data is held up for processing and reorganized. 

    A synchronous transformation is one in which the buffers are immediately handed off to the next downstream transformation at the completion of the transformation logic.

    Both the Multicast and the Conditional Split can have multiple outputs, but all the outputs are synchronous.

    With the exception of the Union All, it functions like a streaming transformation, is really an asynchronous transformation.

    Synchronous transformation outputs preserve the sort order of incoming data, whereas some of the asynchronous transformations do not. The Sort, Merge, and Merge Join asynchronous components, of course, have sorted outputs because of their nature, but the Union All, for example, does not.

    An execution tree is a logical grouping of Data Flow Components (transformations and adapters) based on their synchronous relationship to one another. Groupings are delineated by asynchronous component outputs that indicate the completion of one execution tree and the start of the next. 

    the process thread scheduler can assign more than one thread to a single execution tree if threads are available and the execution tree requires intense processor utilization. Each transformation can receive a single thread, so if an execution tree has only two components that participate, then the execution tree can have a maximum of two threads. In addition, each source adapter receives a separate thread.

    It is important to modify the EngineThreads property of the Data Flow so that the execution trees are not sharing process threads, and extra threads are available for large or complex execution trees. Furthermore, all the execution trees in a package share the number of processor threads allocated in the EngineThreads property of the Data Flow. A single thread or multiple threads are assigned to an execution tree based on availability of threads and complexity of the execution tree. 

    The value for EngineThreads does not include the threads allocated for the number of sources in a Data Flow, which are automatically allocated separate threads.

  • 相关阅读:
    10、xsl中import用法
    09、xsl中输出对应的列和值
    08、xsl中操作子节点带循环输出
    07、xsl中操作子节点
    06、xsl中choose进行多条件选择
    05、xsl中IF的用法
    04、xsl中对字段进行排序
    03、xsl中添加筛选条件
    02、xsl的for循环输出
    01、xsl样式表用网页输出
  • 原文地址:https://www.cnblogs.com/jenrrychen/p/4742205.html
Copyright © 2020-2023  润新知