• Storm


    官网地址:http://storm.apache.org/index.html

    Why use Storm?
    Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
    Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
    Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. Read more in the tutorial.

    概念

    The concepts discussed are:

    1. Topologies
    2. Streams
    3. Spouts
    4. Bolts
    5. Stream groupings
    6. Reliability
    7. Tasks
    8. Workers

    官网说明:http://storm.apache.org/releases/0.9.6/Concepts.html

    Topologies 

    拓扑结构:实时应用程序的逻辑被打包到storm拓扑里面了。一个拓扑就是由 spouts 和 bolts 作为节点,通过 stream groupings 链接起来的图。

    Stream

    数据流,由 tuples 组成的数据流,默认包含integers, longs, shorts, bytes, strings, doubles, floats, booleans, and byte arrays.也可以自定义

    每个stream声明都有个ID, 默认ID是'default'

    Spouts

    水龙头,拓扑的数据源,通常spouts从外部源读取tuples并且emit发射到拓扑流程中。spouts可以是可靠的也可以是非可靠的,

    Bolts

    水阀,所有的处理都是在bolts里完成的

    Stream groupings

    在bolts的任务中定义了stream被分流的规则。包含多种grouping

    1.Shuffle grouping : 随机分发

    2.Fields grouping : 指定fields

    3.All grouping : 分发到全部bolts中

    4.Global grouping : 分发到id最小的task中

    5.None grouping :   

    6.Direct grouping :

    7.Local or Shuffle grouping :

    Reliability

    fail ack 方法

    Tasks

    执行线程

    Workers

    一个worker就是一个jvm,里面会执行很多tasks,

    Nimbus(Master)
      集群管理,接收Jar包,调度topology
    Supervisor(Slave)
      启动worker
    Worker
      一个JVM进程资源分配的单位
    Executor
      实际干活的线程
    ZooKeeper
      存储状态信息,调度信息,心跳
      Supervisor向ZK写信息,Nimbus通过ZK获取这些信息,这样nimbus就知道各个slaves的状态信息。

    集群安装

    官网说明:http://storm.apache.org/releases/0.9.6/Setting-up-a-Storm-cluster.html

    API

    官网说明:http://storm.apache.org/releases/0.9.6/javadocs/index.html

  • 相关阅读:
    MySQL 对于千万级的大表要怎么优化?
    mysql数据库优化总结
    php 正则表达式怎么匹配标签里面的style?
    php一行代码获取本周一,本周日,上周一,上周日,本月一日,本月最后一日,上月一日,上月最后一日日期
    PHP过滤常用标签的正则表达式
    php冒泡排序
    PHP中__FUNCTION__与__METHOD__的区别
    关于PHP程序员技术职业生涯规划
    初识DSP
    MATLAB图像处理基础
  • 原文地址:https://www.cnblogs.com/one--way/p/5773204.html
Copyright © 2020-2023  润新知