• 关于Storm 中Topology的并发度的理解


    来自:https://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html

    http://blog.csdn.net/derekjiang/article/details/9040243

    概念理解

    原文中用了一张图来说明在一个storm cluster中,topology运行时的并发机制。



    其实说白了,当一个topology在storm cluster中运行时,它的并发主要跟3个逻辑实体想过:worker,executor 和task

    1. Worker 是运行在工作节点上面,被Supervisor守护进程创建的用来干活的进程。每个Worker对应于一个给定topology的全部执行任务的一个子集。反过来说,一个Worker里面不会运行属于不同的topology的执行任务。

    2. Executor可以理解成一个Worker进程中的工作线程。一个Executor中只能运行隶属于同一个component(spout/bolt) 的task。一个Worker进程中可以有一个或多个Executor线程。在默认情况下,一个Executor运行一个task。

    3. Task则是spout和bolt中具体要干的活了。一个Executor可以负责1个或多个task。每个component(spout/bolt) 的并发度就是这个component对应的task数量。同时,task也是各个节点之间进行grouping(partition)的单位。



    并发度的配置

    有多种方法可以进行并发度的配置,其优先级如下:

    defaults.yaml < storm.yaml < topology 私有配置 < component level(spout/bolt) 的私有配置 

    至于具体怎么配置,至今拷贝过来大家看看便知:

    设置worker数量

    设置executor数量



    • Description: 给指定component创建的executor数量
    • Configuration option: ?
    • How to set in your code (examples):

    设置task数量

    Here is an example code snippet to show these settings in practice:

    topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
                   .setNumTasks(4)
                   .shuffleGrouping(blue-spout);

    一个运行时的topology的例子

     


    The GreenBolt was configured as per the code snippet above whereas BlueSpout and YellowBolt only set the parallelism hint (number of executors). Here is the relevant code:

    Config conf = new Config();
    conf.setNumWorkers(2); // use two worker processes
    
    topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); // set parallelism hint to 2
    
    topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
                   .setNumTasks(4)
                   .shuffleGrouping("blue-spout");
    
    topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
                   .shuffleGrouping("green-bolt");
    
    StormSubmitter.submitTopology(
            "mytopology",
            conf,
            topologyBuilder.createTopology()
        );

    And of course Storm comes with additional configuration settings to control the parallelism of a topology, including:

    • TOPOLOGY_MAX_TASK_PARALLELISM: This setting puts a ceiling on the number of executors that can be spawned for a single component. It is typically used during testing to limit the number of threads spawned when running a topology in local mode. You can set this option via e.g. Config#setMaxTaskParallelism().

     

    怎么样在运行过程中修改一个topology的并发度

    Storm支持在不restart topology的情况下, 动态的改变(增减)worker processes的数目和executors的数目, 称为rebalancing. 

    主要有两种方法可以rebalance一个topology:

    1. 使用Storm web UI 来 rebalance topology.
    2. 使用CLI 工具 rebalance topology,一个例子如下:
    # Reconfigure the topology "mytopology" to use 5 worker processes,
    # the spout "blue-spout" to use 3 executors and
    # the bolt "yellow-bolt" to use 10 executors.
    
    storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10
    
  • 相关阅读:
    Django——form组件和ModelForm
    CDH hadoop的安装
    Vulnhub-靶机-PRIME: 1
    Vulnhub-靶机-SYMFONOS: 5
    sqlilab-Less-21-30-writeup
    Vulnhub-靶机-SYMFONOS: 4
    Vulnhub-靶机-SYMFONOS: 3
    基础汇总-sqlilab-Less-1-20
    sqlilab-Less-13-19 测试writeup
    sqlilab-Less-9-12 测试writeup
  • 原文地址:https://www.cnblogs.com/sunxucool/p/4310831.html
Copyright © 2020-2023  润新知