• Storm-源码分析- Disruptor在storm中的使用


    Disruptor 2.0, (http://ifeve.com/disruptor-2-change/)

    Disruptor为了更便于使用, 在2.0做了比较大的调整, 比较突出的是更换了几乎所有的概念名

    老版本,

    image

    新版本,

    image

    从左到右的变化如下,

    1. Producer –> Publisher
    2. ProducerBarrier被integrate到RingBuffer里面, 叫做PublishPort, 提供publish接口
    3. Entry –> Event
    4, Cursor封装成Sequence, 其实Sequence就是将cursor+pading封装一下
    5. Consumer –> EventProcesser
    6. ConsumerBarrier 变为DependencyBarrier, 或SequenceBarrier

    并且对于publisher和EventProcesser, 存在ClaimStrategy和WaitStrategy
    对于publisher的ClaimStrategy, 由于publisher需要先claim到sequencer才能publish: SingleThreadedClaimStrategy, MultiThreadedClaimStrategy, 应该是对于singlethread不需要使用CAS更为高效
    对于EventProcesser的WaitStrategy, 当取不到数据的时候采用什么样的策略进行等待: BlockingWaitStrategy, BusySpinWaitStrategy, SleepingWaitStrategy, YieldingWaitStrategy
    Blocking就是同步加锁, BusySpin就是忙等耗CPU, 都比较低效
    Yielding就是调用thread.yeild(), 把线程的从可执行状态调整成就绪装, 意思我先息下, 你们忙你们先来, 就是把CPU让给其他的线程, 但是yeild并不保证过多久线程被执行, 如果没有其他线程, 可能会被立即执行
    而sleep, 会强制线程休眠指定时间, 然后再重新调度

     

    DisruptorQueue.java

        static final Object FLUSH_CACHE = new Object(); //特殊对象, 当consumer取到时, 触发cache queue的flush
        static final Object INTERRUPT = new Object(); //特殊对象, 当consumer取到时, 触发InterruptedException
        
        RingBuffer<MutableObject> _buffer; //Disruptor的主要的数据结构RingBuffer
        Sequence _consumer; //consumer读取序号
        SequenceBarrier _barrier; //用于consumer监听RingBuffer的序号情况
        
        // TODO: consider having a threadlocal cache of this variable to speed up reads?
        volatile boolean consumerStartedFlag = false; //标志consumer是否start, 由于需要在change后其他线程可以马上知道, 所以使用volatile
        ConcurrentLinkedQueue<Object> _cache = new ConcurrentLinkedQueue(); //当consumer没有start的时候, cache event的queue

    ConcurrentLinkedQueue, 使用CAS而非lock来实现的线程安全队列, 具体参考(http://blog.sina.com.cn/s/blog_5efa3473010129pj.html)

    首先声明一组变量, 部分会在构造函数中被初始化
    最重要的结构就是RingBuffer, 这是个模板类, 这里从ObjectEventFactory()的实现也可以看出来, 初始化的时候在ringbuffer的每个entry上都创建一个MutableObject对象

    MutableObject的实现很简单, 这是封装了object o, 为什么要做这层封装?
    为了避免Java GC, 对于RingBuffer一旦初始化好, 上面的所有的MutableObject都不会被释放, 你只是去对object o, set不同的值

    _buffer = new RingBuffer<MutableObject>(new ObjectEventFactory(), claim, wait);
    public static class ObjectEventFactory implements EventFactory<MutableObject> {
        @Override
        public MutableObject newInstance() {
            return new MutableObject();
        }        
    }
    public class MutableObject {
        Object o = null;
    }


    Publish

    Publish过程, 可见当前ProducerBarrier已经被集成到RingBuffer里面, 所以直接调用_buffer的接口
    首先调用next, claim序号
    取出序号上的MutableObject, 并将输入obj set
    最后, publish当前序号, 表示consumer可以读取
    当consumer没有start时, 会将obj cache在_cache中, 而不会放到ringbuffer中 (我没有想明白why? 为何要使用低效的链表queue来cache, 而不直接放到ringbuffer里面)

        public void publish(Object obj, boolean block) throws InsufficientCapacityException {
            if(consumerStartedFlag) {
                final long id;
                if(block) {
                    id = _buffer.next();
                } else {
                    id = _buffer.tryNext(1);
                }
                final MutableObject m = _buffer.get(id);
                m.setObject(obj);
                _buffer.publish(id);
            } else {
                _cache.add(obj);
                if(consumerStartedFlag) flushCache();
            }
        }

    Consume

    consume的过程, 这里实现的时Batch consume, 即给定Cursor, 会一直consume到该cursor为止

    _consumer代表当前已经被consume的序号, 所以从_consumer.get() + 1开始读
    取出MutableObject中的o, 并将MutableObject 清空
    根据o的情况, 3种情况,
        1. 如果是FLUSH_CACHE对象, 将cache中的event读出调用handler.onEvent
        2. 如果是INTERRUPT对象, 触发InterruptedException
        3. 正常情况, 直接调用handler.onEvent处理该o, curr == cursor判断表示batch是否结束, 当读到cursor的时候结束

    最终将_consumer置为cursor, 表示已经读到cursor位置

        private void consumeBatchToCursor(long cursor, EventHandler<Object> handler) {
            for(long curr = _consumer.get() + 1; curr <= cursor; curr++) {
                try {
                    MutableObject mo = _buffer.get(curr);
                    Object o = mo.o;
                    mo.setObject(null);
                    if(o==FLUSH_CACHE) {
                        Object c = null;
                        while(true) {                        
                            c = _cache.poll();
                            if(c==null) break;
                            else handler.onEvent(c, curr, true);
                        }
                    } else if(o==INTERRUPT) {
                        throw new InterruptedException("Disruptor processing interrupted");
                    } else {
                        handler.onEvent(o, curr, curr == cursor);
                    }
                } catch (Exception e) {
                    throw new RuntimeException(e);
                }
            }
            //TODO: only set this if the consumer cursor has changed?
            _consumer.set(cursor);
        }

    backtype.storm.disruptor.clj

    创建DisruptorQueue, 选用MultiThreadedClaimStrategy和BlockingWaitStrategy

    (defnk disruptor-queue [buffer-size :claim-strategy :multi-threaded :wait-strategy :block]
      (DisruptorQueue. ((CLAIM-STRATEGY claim-strategy) buffer-size)
                       (mk-wait-strategy wait-strategy)
                       ))

    并封装一系列Java接口
    最重要的工作是, 启动consume-loop
    这里ret是closeover了一个间隔为0的不停执行(consume-batch-when-available queue handler) 的线程, 而consumeBatchWhenAvailable的实现就是不停的sleep并调用consumeBatchToCursor

    并且通过consumer-started!通知其他线程consumer已经start

    (defnk consume-loop* [^DisruptorQueue queue handler :kill-fn (fn [error] (halt-process! 1 "Async loop died!"))
                          :thread-name nil]
      (let [ret (async-loop
                  (fn []
                    (consume-batch-when-available queue handler)
                    0 )
                  :kill-fn kill-fn
                  :thread-name thread-name
                  )]
         (consumer-started! queue)
         ret
         ))
    
    (defmacro consume-loop [queue & handler-args]
      `(let [handler# (handler ~@handler-args)]
         (consume-loop* ~queue handler#)
         ))


    看看async-loop实现什么功能?
    返回reify实现的record, 其中closeover了thread
    这个thread主要就是死循环的执行传入的afn, 并且以afn的返回值作为执行间隔

    主要功能, 异步的loop, 开启新的线程来执行loop, 而不是在当前主线程, 并且提供了sleep设置

    ;; afn returns amount of time to sleep
    (defnk async-loop [afn
                       :daemon false
                       :kill-fn (fn [error] (halt-process! 1 "Async loop died!"))
                       :priority Thread/NORM_PRIORITY
                       :factory? false
                       :start true
                       :thread-name nil]
      (let [thread (Thread.
                    (fn []
                      (try-cause
                        (let [afn (if factory? (afn) afn)]
                          (loop []
                            (let [sleep-time (afn)]
                              (when-not (nil? sleep-time)
                                (sleep-secs sleep-time)
                                (recur))
                              )))
                        (catch InterruptedException e
                          (log-message "Async loop interrupted!")
                          )
                        (catch Throwable t
                          (log-error t "Async loop died!")
                          (kill-fn t)
                          ))
                      ))]
        (.setDaemon thread daemon)
        (.setPriority thread priority)
        (when thread-name
          (.setName thread (str (.getName thread) "-" thread-name)))
        (when start
          (.start thread))
        ;; should return object that supports stop, interrupt, join, and waiting?
        (reify SmartThread
          (start [this]
            (.start thread))
          (join [this]
            (.join thread))
          (interrupt [this]
            (.interrupt thread))
          (sleeping? [this]
            (Time/isThreadWaiting thread)
            ))
          ))

     

    Storm在Worker中executors线程间通信, 如何使用Disruptor的?

    image

     

    Understanding the Internal Message Buffers of Storm, 可以参考

  • 相关阅读:
    ps图像渐变
    QPaintDevice: Cannot destroy paint device that is being painted
    QWidget::paintEngine: Should no longer be called
    权谋 — 朱元璋
    TL(简单)
    Access“输入的表达式中含有一个无效日期值”
    Qt label加边框
    Guardian of Decency(二分图)
    匈牙利算法的小总结
    Simple Molecules(简单)
  • 原文地址:https://www.cnblogs.com/fxjwind/p/3182351.html
Copyright © 2020-2023  润新知