• Storm


    官网地址:http://storm.apache.org/index.html

    Why use Storm?
    Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
    Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
    Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. Read more in the tutorial.

    概念

    The concepts discussed are:

    1. Topologies
    2. Streams
    3. Spouts
    4. Bolts
    5. Stream groupings
    6. Reliability
    7. Tasks
    8. Workers

    官网说明:http://storm.apache.org/releases/0.9.6/Concepts.html

    Topologies 

    拓扑结构:实时应用程序的逻辑被打包到storm拓扑里面了。一个拓扑就是由 spouts 和 bolts 作为节点,通过 stream groupings 链接起来的图。

    Stream

    数据流,由 tuples 组成的数据流,默认包含integers, longs, shorts, bytes, strings, doubles, floats, booleans, and byte arrays.也可以自定义

    每个stream声明都有个ID, 默认ID是'default'

    Spouts

    水龙头,拓扑的数据源,通常spouts从外部源读取tuples并且emit发射到拓扑流程中。spouts可以是可靠的也可以是非可靠的,

    Bolts

    水阀,所有的处理都是在bolts里完成的

    Stream groupings

    在bolts的任务中定义了stream被分流的规则。包含多种grouping

    1.Shuffle grouping : 随机分发

    2.Fields grouping : 指定fields

    3.All grouping : 分发到全部bolts中

    4.Global grouping : 分发到id最小的task中

    5.None grouping :   

    6.Direct grouping :

    7.Local or Shuffle grouping :

    Reliability

    fail ack 方法

    Tasks

    执行线程

    Workers

    一个worker就是一个jvm,里面会执行很多tasks,

    Nimbus(Master)
      集群管理,接收Jar包,调度topology
    Supervisor(Slave)
      启动worker
    Worker
      一个JVM进程资源分配的单位
    Executor
      实际干活的线程
    ZooKeeper
      存储状态信息,调度信息,心跳
      Supervisor向ZK写信息,Nimbus通过ZK获取这些信息,这样nimbus就知道各个slaves的状态信息。

    集群安装

    官网说明:http://storm.apache.org/releases/0.9.6/Setting-up-a-Storm-cluster.html

    API

    官网说明:http://storm.apache.org/releases/0.9.6/javadocs/index.html

  • 相关阅读:
    AFNetworking 3.0迁移指南
    富文本常用封装(NSAttributedString浅析)
    如何在 Objective-C 的环境下实现 defer
    iOS之深入了解控制器View的加载
    10+年程序员总结的20+条经验教训
    Foundation: NSNotificationCenter
    做一款仿映客的直播App?看我就够了
    AFNetworking源码分析
    WWDC2016 Session笔记 – Xcode 8 Auto Layout新特性
    iOS页面间传值的一些方式总结
  • 原文地址:https://www.cnblogs.com/one--way/p/5773204.html
Copyright © 2020-2023  润新知