• [源码分析] 分布式任务队列 Celery 之 发送Task & AMQP


    [源码分析] 分布式任务队列 Celery 之 发送Task & AMQP

    0x00 摘要

    Celery是一个简单、灵活且可靠的,处理大量消息的分布式系统,专注于实时处理的异步任务队列,同时也支持任务调度。

    在之前的文章中,我们看到了关于Task的分析,本文我们重点看看在客户端如何发送Task,以及 Celery 的amqp对象如何使用。

    在阅读之前,我们依然要提出几个问题,以此作为阅读时候的指引:

    • 客户端启动时候,Celery 应用 和 用户自定义 Task 是如何生成的?
    • Task 装饰器起到了什么作用?
    • 发送 Task 时候,消息是如何组装的?
    • 发送 Task 时候,采用什么媒介(模块)来发送?amqp?
    • Task 发送出去之后,在 Redis 之中如何存储?

    说明:在整理文章时,发现漏发了一篇,从而会影响大家阅读思路,特此补上,请大家谅解。

    [源码分析] 消息队列 Kombu 之 mailbox

    [源码分析] 消息队列 Kombu 之 Hub

    [源码分析] 消息队列 Kombu 之 Consumer

    [源码分析] 消息队列 Kombu 之 Producer

    [源码分析] 消息队列 Kombu 之 启动过程

    [源码解析] 消息队列 Kombu 之 基本架构

    [源码解析] 并行分布式框架 Celery 之架构 (1)

    [源码解析] 并行分布式框架 Celery 之架构 (2)

    [源码解析] 并行分布式框架 Celery 之 worker 启动 (1)

    [源码解析] 并行分布式框架 Celery 之 worker 启动 (2)

    [源码解析] 分布式任务队列 Celery 之启动 Consumer

    [源码解析] 并行分布式任务队列 Celery 之 Task是什么

    [从源码学设计]celery 之 发送Task & AMQP 就是本文,从客户端角度讲解发送Task

    [源码解析] 并行分布式任务队列 Celery 之 消费动态流程 下一篇文章从服务端角度讲解收到 Task 如何消费

    [源码解析] 并行分布式任务队列 Celery 之 多进程模型

    0x01 示例代码

    我们首先给出示例代码。

    1.1 服务端

    示例代码服务端如下,这里使用了装饰器来包装待执行任务。

    from celery import Celery
    
    app = Celery('myTest', broker='redis://localhost:6379')
    
    @app.task
    def add(x,y):
        return x+y
    
    if __name__ == '__main__':
        app.worker_main(argv=['worker'])
    

    1.2 客户端

    客户端发送代码如下,就是调用 add Task 来做加法计算:

    from myTest import add
    re = add.apply_async((2,17))
    

    我们开始具体介绍,以下均是客户端的执行序列。

    0x02 系统启动

    我们首先要介绍 在客户端,Celery 系统和 task(实例) 是如何启动的。

    2.1 产生Celery

    如下代码首先会执行 myTest 这个 Celery。

    app = Celery('myTest', broker='redis://localhost:6379')
    

    2.2 task 装饰器

    Celery 使用了装饰器来包装待执行任务(因为各种语言的类似概念,在本文中可能会混用装饰器或者注解这两个术语)

    @app.task
    def add(x,y):
        return x+y
    

    task这个装饰器具体执行其实就是返回 _create_task_cls 这个内部函数执行的结果

    这个函数返回一个Proxy,Proxy 在真正执行到的时候,会执行 _task_from_fun

    _task_from_fun 的作用是:将该task添加到全局变量中,即 当调用 _task_from_fun 时会将该任务添加到app任务列表中,以此达到所有任务共享的目的这样客户端才能知道这个 task

        def task(self, *args, **opts):
            """Decorator to create a task class out of any callable. """
            if USING_EXECV and opts.get('lazy', True):
                from . import shared_task
                return shared_task(*args, lazy=False, **opts)
    
            def inner_create_task_cls(shared=True, filter=None, lazy=True, **opts):
                _filt = filter
    
                def _create_task_cls(fun):
                    if shared:
                        def cons(app):
                            return app._task_from_fun(fun, **opts) # 将该task添加到全局变量中,当调用_task_from_fun时会将该任务添加到app任务列表中,以此达到所有任务共享的目的
                        cons.__name__ = fun.__name__
                        connect_on_app_finalize(cons)
                    if not lazy or self.finalized:
                        ret = self._task_from_fun(fun, **opts)
                    else:
                        # return a proxy object that evaluates on first use
                        ret = PromiseProxy(self._task_from_fun, (fun,), opts,
                                           __doc__=fun.__doc__)
                        self._pending.append(ret)
                    if _filt:
                        return _filt(ret)
                    return ret
    
                return _create_task_cls
    
            if len(args) == 1:
                if callable(args[0]):
                    return inner_create_task_cls(**opts)(*args) #执行在这里
            return inner_create_task_cls(**opts)
    

    我们具体分析下这个装饰器。

    2.2.1 添加任务

    在初始化过程中,为每个app添加该任务时,会调用到app._task_from_fun(fun, **options)

    具体作用是:

    • 判断各种参数配置;
    • 动态创建task;
    • 将任务添加到_tasks任务中;
    • 用task的bind方法绑定相关属性到该实例上;

    代码如下:

        def _task_from_fun(self, fun, name=None, base=None, bind=False, **options):
    
            name = name or self.gen_task_name(fun.__name__, fun.__module__)         # 如果传入了名字则使用,否则就使用moudle name的形式
            base = base or self.Task                                                # 是否传入Task,否则用类自己的Task类 默认celery.app.task:Task
    
            if name not in self._tasks:                                             # 如果要加入的任务名称不再_tasks中
                run = fun if bind else staticmethod(fun)                            # 是否bind该方法是则直接使用该方法,否则就置为静态方法
                task = type(fun.__name__, (base,), dict({
                    'app': self,                                                    # 动态创建Task类实例
                    'name': name,                                                   # Task的name
                    'run': run,                                                     # task的run方法
                    '_decorated': True,                                             # 是否装饰
                    '__doc__': fun.__doc__,
                    '__module__': fun.__module__,
                    '__header__': staticmethod(head_from_fun(fun, bound=bind)),
                    '__wrapped__': run}, **options))()                              
                # for some reason __qualname__ cannot be set in type()
                # so we have to set it here.
                try:
                    task.__qualname__ = fun.__qualname__                            
                except AttributeError:
                    pass
                self._tasks[task.name] = task                                       # 将任务添加到_tasks任务中
                task.bind(self)  # connects task to this app                        # 调用task的bind方法绑定相关属性到该实例上
    
                add_autoretry_behaviour(task, **options)
            else:
                task = self._tasks[name]
            return task  
    

    2.2.2 绑定

    bind方法的作用是:绑定相关属性到该实例上,因为只知道 task 名字或者代码是不够的,还需要在运行时候拿到 task 的实例

    @classmethod
    def bind(cls, app):
        was_bound, cls.__bound__ = cls.__bound__, True
        cls._app = app                                          # 设置类的_app属性
        conf = app.conf                                         # 获取app的配置信息
        cls._exec_options = None  # clear option cache
    
        if cls.typing is None:
            cls.typing = app.strict_typing
    
        for attr_name, config_name in cls.from_config:          # 设置类中的默认值
            if getattr(cls, attr_name, None) is None:           # 如果获取该属性为空
                setattr(cls, attr_name, conf[config_name])      # 使用app配置中的默认值
    
        # decorate with annotations from config.
        if not was_bound:
            cls.annotate()
    
            from celery.utils.threads import LocalStack
            cls.request_stack = LocalStack()                    # 使用线程栈保存数据
    
        # PeriodicTask uses this to add itself to the PeriodicTask schedule.
        cls.on_bound(app)
    
        return app
    

    2.3 小结

    至此,在客户端(使用者方),Celery 应用已经启动,一个task实例也已经生成,其属性都被绑定在实例上

    0x03 amqp类

    在客户端调用 apply_async 的时候,会调用 app.send_task 来具体发送任务,其中用到 amqp,所以我们首先讲讲 amqp 类。

    3.1 生成

    在 send_task 之中有如下代码,就是:

        def send_task(self, ....):
            """Send task by name.
            """
            parent = have_parent = None
            amqp = self.amqp # 此时生成
    

    此时的 self 是 Celery 应用本身,具体内容我们打印出来看看,从下面我们可以看到 Celery 应用是什么样子。

    self = {Celery} <Celery myTest at 0x1eeb5590488>
     AsyncResult = {type} <class 'celery.result.AsyncResult'>
     Beat = {type} <class 'celery.apps.beat.Beat'>
     GroupResult = {type} <class 'celery.result.GroupResult'>
     Pickler = {type} <class 'celery.app.utils.AppPickler'>
     ResultSet = {type} <class 'celery.result.ResultSet'>
     Task = {type} <class 'celery.app.task.Task'>
     WorkController = {type} <class 'celery.worker.worker.WorkController'>
     Worker = {type} <class 'celery.apps.worker.Worker'>
     amqp = {AMQP} <celery.app.amqp.AMQP object at 0x000001EEB5884188>
     amqp_cls = {str} 'celery.app.amqp:AMQP'
     backend = {DisabledBackend} <celery.backends.base.DisabledBackend object at 0x000001EEB584E248>
     clock = {LamportClock} 0
     control = {Control} <celery.app.control.Control object at 0x000001EEB57B37C8>
     events = {Events} <celery.app.events.Events object at 0x000001EEB56C7188>
     loader = {AppLoader} <celery.loaders.app.AppLoader object at 0x000001EEB5705408>
     main = {str} 'myTest'
     pool = {ConnectionPool} <kombu.connection.ConnectionPool object at 0x000001EEB57A9688>
     producer_pool = {ProducerPool} <kombu.pools.ProducerPool object at 0x000001EEB6297508>
     registry_cls = {type} <class 'celery.app.registry.TaskRegistry'>
     tasks = {TaskRegistry: 10} {'myTest.add': <@task: myTest.add of myTest at 0x1eeb5590488>, 'celery.accumulate': <@task: celery.accumulate of myTest at 0x1eeb5590488>, 'celery.chord_unlock': <@task: celery.chord_unlock of myTest at 0x1eeb5590488>, 'celery.chunks': <@task: celery.chunks of myTest at 0x1eeb5590488>, 'celery.backend_cleanup': <@task: celery.backend_cleanup of myTest at 0x1eeb5590488>, 'celery.group': <@task: celery.group of myTest at 0x1eeb5590488>, 'celery.map': <@task: celery.map of myTest at 0x1eeb5590488>, 'celery.chain': <@task: celery.chain of myTest at 0x1eeb5590488>, 'celery.starmap': <@task: celery.starmap of myTest at 0x1eeb5590488>, 'celery.chord': <@task: celery.chord of myTest at 0x1eeb5590488>}
    

    堆栈为:

    amqp, base.py:1205
    __get__, objects.py:43
    send_task, base.py:705
    apply_async, task.py:565
    <module>, myclient.py:4
    

    为什么赋值语句就可以生成 amqp?是因为其被 cached_property 修饰。

    使用 cached_property 修饰过的函数,就变成是对象的属性,该对象第一次引用该属性时,会调用函数,对象第二次引用该属性时就直接从词典中取了,即 Caches the return value of the get method on first call。

        @cached_property
        def amqp(self):
            """AMQP related functionality: :class:`~@amqp`."""
            return instantiate(self.amqp_cls, app=self)
    

    3.2 定义

    AMQP类就是对amqp协议实现的再一次封装,在这里其实就是对 kombu 类的再一次封装

    class AMQP:
        """App AMQP API: app.amqp."""
    
        Connection = Connection
        Consumer = Consumer
        Producer = Producer
    
        #: compat alias to Connection
        BrokerConnection = Connection
    
        queues_cls = Queues
    
        #: Cached and prepared routing table.
        _rtable = None
    
        #: Underlying producer pool instance automatically
        #: set by the :attr:`producer_pool`.
        _producer_pool = None
    
        # Exchange class/function used when defining automatic queues.
        # For example, you can use ``autoexchange = lambda n: None`` to use the
        # AMQP default exchange: a shortcut to bypass routing
        # and instead send directly to the queue named in the routing key.
        autoexchange = None
    

    具体内容我们打印出来看看,我们可以看到 amqp 是什么样子。

    amqp = {AMQP}  
     BrokerConnection = {type} <class 'kombu.connection.Connection'>
     Connection = {type} <class 'kombu.connection.Connection'>
     Consumer = {type} <class 'kombu.messaging.Consumer'>
     Producer = {type} <class 'kombu.messaging.Producer'>
     app = {Celery} <Celery myTest at 0x252bd2903c8>
     argsrepr_maxsize = {int} 1024
     autoexchange = {NoneType} None
     default_exchange = {Exchange} Exchange celery(direct)
     default_queue = {Queue} <unbound Queue celery -> <unbound Exchange celery(direct)> -> celery>
     kwargsrepr_maxsize = {int} 1024
     producer_pool = {ProducerPool} <kombu.pools.ProducerPool object at 0x00000252BDC8F408>
     publisher_pool = {ProducerPool} <kombu.pools.ProducerPool object at 0x00000252BDC8F408>
     queues = {Queues: 1} {'celery': <unbound Queue celery -> <unbound Exchange celery(direct)> -> celery>}
     queues_cls = {type} <class 'celery.app.amqp.Queues'>
     router = {Router} <celery.app.routes.Router object at 0x00000252BDC6B248>
     routes = {tuple: 0} ()
     task_protocols = {dict: 2} {1: <bound method AMQP.as_task_v1 of <celery.app.amqp.AMQP object at 0x00000252BDC74148>>, 2: <bound method AMQP.as_task_v2 of <celery.app.amqp.AMQP object at 0x00000252BDC74148>>}
     utc = {bool} True
      _event_dispatcher = {EventDispatcher} <celery.events.dispatcher.EventDispatcher object at 0x00000252BE750348>
      _producer_pool = {ProducerPool} <kombu.pools.ProducerPool object at 0x00000252BDC8F408>
      _rtable = {tuple: 0} ()
    

    具体逻辑如下:

    +---------+
    | Celery  |    +----------------------------+
    |         |    |   celery.app.amqp.AMQP     |
    |         |    |                            |
    |         |    |                            |
    |         |    |          BrokerConnection +----->  kombu.connection.Connection
    |         |    |                            |
    |   amqp+----->+          Connection       +----->  kombu.connection.Connection
    |         |    |                            |
    +---------+    |          Consumer         +----->  kombu.messaging.Consumer
                   |                            |
                   |          Producer         +----->  kombu.messaging.Producer
                   |                            |
                   |          producer_pool    +----->  kombu.pools.ProducerPool
                   |                            |
                   |          queues           +----->  celery.app.amqp.Queues
                   |                            |
                   |          router           +----->  celery.app.routes.Router
                   +----------------------------+
    

    0x04 发送Task

    我们接着看看客户端如何发送task。

    from myTest import add
    re = add.apply_async((2,17))
    

    总述下逻辑:

    • Producer 初始化过程完成了连接用的内容,比如调用self.connect方法,到预定的Transport类中连接载体,并初始化Chanel,self.chanel = self.connection;
    • 调用 Message 封装消息;
    • Exchange 将 routing_key 转为 queue;
    • 调用 amqp 发送消息;
    • Channel 负责最终消息发布;

    我们下面详细解读下。

    4.1 apply_async in task

    这里重要的是几点:

    • 进行了组装待发送任务的任务的参数,如 connection,queue,exchange,routing_key等
    • 如果是 task_always_eager,则产生一个 Kombu . producer;即如果是配置了本地直接执行则本地执行直接返回结果
    • 否则,调用 amqp 来发送 task(我们主要看这里);

    缩减版代码如下:

        def apply_async(self, args=None, kwargs=None, task_id=None, producer=None,
                        link=None, link_error=None, shadow=None, **options):
            """Apply tasks asynchronously by sending a message.
            """
            
            preopts = self._get_exec_options()
            options = dict(preopts, **options) if options else preopts
    
            app = self._get_app()
            if app.conf.task_always_eager:
                # 获取 producer
                with app.producer_or_acquire(producer) as eager_producer:      
                    serializer = options.get('serializer')
                    body = args, kwargs
                    content_type, content_encoding, data = serialization.dumps(
                        body, serializer,
                    )
                    args, kwargs = serialization.loads(
                        data, content_type, content_encoding,
                        accept=[content_type]
                    )
                with denied_join_result():
                    return self.apply(args, kwargs, task_id=task_id or uuid(),
                                      link=link, link_error=link_error, **options)
            else:
                return app.send_task( #调用到这里
                    self.name, args, kwargs, task_id=task_id, producer=producer,
                    link=link, link_error=link_error, result_cls=self.AsyncResult,
                    shadow=shadow, task_type=self,
                    **options
                )
    

    此时如下:

             1  apply_async       +-------------------+
                                  |                   |
    User  +---------------------> | task: myTest.add  |
                                  |                   |
                                  +-------------------+
    

    4.2 send_task

    此函数作用是生成任务信息,调用amqp发送任务:

    • 获取amqp实例;
    • 设置任务id,如果没有传入则生成任务id;
    • 生成路由值,如果没有则使用amqp的router;
    • 生成route信息;
    • 生成任务信息;
    • 如果有连接则生成生产者;
    • 发送任务消息;
    • 生成异步任务实例;
    • 返回结果;

    这里调用到了 Celery 应用。为啥还要调用到 Celery 应用本身呢?Task 自身没有关于 MQ 的任何消息,而只有一个绑定的 Celery 对象,所以从抽象层面就只能交给 Celery 了,而 Celery 却包含了所有你需要的信息,是可以完成这个任务的。

    具体如下:

    def send_task(self, name, ...):
        """Send task by name.
        """
        parent = have_parent = None
        amqp = self.amqp                                                    # 获取amqp实例
        task_id = task_id or uuid()                                         # 设置任务id,如果没有传入则生成任务id
        producer = producer or publisher  # XXX compat                      # 生成这
        router = router or amqp.router                                      # 路由值,如果没有则使用amqp的router
        options = router.route(
            options, route_name or name, args, kwargs, task_type)           # 生成route信息
    
        message = amqp.create_task_message( # 生成任务信息
            task_id, name, args, kwargs, countdown, eta, group_id, group_index,
            expires, retries, chord,
            maybe_list(link), maybe_list(link_error),
            reply_to or self.thread_oid, time_limit, soft_time_limit,
            self.conf.task_send_sent_event,
            root_id, parent_id, shadow, chain,
            argsrepr=options.get('argsrepr'),
            kwargsrepr=options.get('kwargsrepr'),
        )
    
        if connection:
            producer = amqp.Producer(connection)                            # 如果有连接则生成生产者
        
        with self.producer_or_acquire(producer) as P:                       
            with P.connection._reraise_as_library_errors():
                self.backend.on_task_call(P, task_id)
                amqp.send_task_message(P, name, message, **options)         # 发送任务消息 
        
        result = (result_cls or self.AsyncResult)(task_id)                  # 生成异步任务实例
        if add_to_parent:
            if not have_parent:
                parent, have_parent = self.current_worker_task, True
            if parent:
                parent.add_trail(result)
        return result                                                       # 返回结果
    

    此时如下:

             1  apply_async       +-------------------+
                                  |                   |
    User  +---------------------> | task: myTest.add  |
                                  |                   |
                                  +--------+----------+
                                           |
                                           |
                            2 send_task    |
                                           |
                                           v
                                    +------+--------+
                                    | Celery myTest |
                                    |               |
                                    +------+--------+
                                           |
                                           |
                      3 send_task_message  |
                                           |
                                           v
                                   +-------+---------+
                                   |      amqp       |
                                   |                 |
                                   |                 |
                                   +-----------------+
    
    

    4.3 生成消息内容

    as_task_v2 会具体生成消息内容,消息体的预处理都是在这里完成的,例如检验和转换参数格式。

    大家可以看到如果实现一个消息,需要用到几个大部分,这里奇怪的是,对于一个异步调用,task 名和 id 都是放在 headers 里头的,而参数什么的却是放在 body 里面:

    • headers,包括:task name, task id, expires, 等等;
    • 消息类型 和 编码方式:content-encoding,content-type;
    • 参数:这些就是 Celery 特有的,用来区分不同队列的,比如:exchange,routing_key 等等;
    • body : 就是消息体;

    最终具体消息举例如下:

    {
    	"body": "W1syLCA4XSwge30sIHsiY2FsbGJhY2tzIjogbnVsbCwgImVycmJhY2tzIjogbnVsbCwgImNoYWluIjogbnVsbCwgImNob3JkIjogbnVsbH1d",
    	"content-encoding": "utf-8",
    	"content-type": "application/json",
    	"headers": {
    		"lang": "py",
    		"task": "myTest.add",
    		"id": "243aac4a-361b-4408-9e0c-856e2655b7b5",
    		"shadow": null,
    		"eta": null,
    		"expires": null,
    		"group": null,
    		"group_index": null,
    		"retries": 0,
    		"timelimit": [null, null],
    		"root_id": "243aac4a-361b-4408-9e0c-856e2655b7b5",
    		"parent_id": null,
    		"argsrepr": "(2, 8)",
    		"kwargsrepr": "{}",
    		"origin": "gen33652@DESKTOP-0GO3RPO"
    	},
    	"properties": {
    		"correlation_id": "243aac4a-361b-4408-9e0c-856e2655b7b5",
    		"reply_to": "b34fcf3d-da9a-3717-a76f-44b6a6362da1",
    		"delivery_mode": 2,
    		"delivery_info": {
    			"exchange": "",
    			"routing_key": "celery"
    		},
    		"priority": 0,
    		"body_encoding": "base64",
    		"delivery_tag": "fa1bc9c8-3709-4c02-9543-8d0fe3cf4e6c"
    	}
    }
    

    具体代码如下,这里的 sent_event 是后续发送时候需要,并不体现在具体消息内容之中:

    def as_task_v2(self, task_id, name, args=None, kwargs=None, ......):
    
        ......
        
        return task_message(
            headers={
                'lang': 'py',
                'task': name,
                'id': task_id,
                'shadow': shadow,
                'eta': eta,
                'expires': expires,
                'group': group_id,
                'group_index': group_index,
                'retries': retries,
                'timelimit': [time_limit, soft_time_limit],
                'root_id': root_id,
                'parent_id': parent_id,
                'argsrepr': argsrepr,
                'kwargsrepr': kwargsrepr,
                'origin': origin or anon_nodename()
            },
            properties={
                'correlation_id': task_id,
                'reply_to': reply_to or '',
            },
            body=(
                args, kwargs, {
                    'callbacks': callbacks,
                    'errbacks': errbacks,
                    'chain': chain,
                    'chord': chord,
                },
            ),
            sent_event={
                'uuid': task_id,
                'root_id': root_id,
                'parent_id': parent_id,
                'name': name,
                'args': argsrepr,
                'kwargs': kwargsrepr,
                'retries': retries,
                'eta': eta,
                'expires': expires,
            } if create_sent_event else None,
        )
    

    4.4 send_task_message in amqp

    amqp.send_task_message(P, name, message, **options) 是用来 amqp 发送任务。

    该方法主要是组装待发送任务的参数,如connection,queue,exchange,routing_key等,调用 producer 的 publish 发送任务。

    基本套路就是:

    • 获得 queue;
    • 获得 delivery_mode;
    • 获得 exchange;
    • 获取重试策略等;
    • 调用 producer 来发送消息;
            def send_task_message(producer, name, message,
                                  exchange=None, routing_key=None, queue=None,
                                  event_dispatcher=None,
                                  retry=None, retry_policy=None,
                                  serializer=None, delivery_mode=None,
                                  compression=None, declare=None,
                                  headers=None, exchange_type=None, **kwargs):
        				# 获得 queue, 获得 delivery_mode, 获得 exchange, 获取重试策略等
    
                if before_receivers:
                    send_before_publish(
                        sender=name, body=body,
                        exchange=exchange, routing_key=routing_key,
                        declare=declare, headers=headers2,
                        properties=properties, retry_policy=retry_policy,
                    )
                
                ret = producer.publish(
                    body,
                    exchange=exchange,
                    routing_key=routing_key,
                    serializer=serializer or default_serializer,
                    compression=compression or default_compressor,
                    retry=retry, retry_policy=_rp,
                    delivery_mode=delivery_mode, declare=declare,
                    headers=headers2,
                    **properties
                )
                if after_receivers:
                    send_after_publish(sender=name, body=body, headers=headers2,
                                       exchange=exchange, routing_key=routing_key)
     
                .....
      
                if sent_event: # 这里就处理了sent_event
                    evd = event_dispatcher or default_evd
                    exname = exchange
                    if isinstance(exname, Exchange):
                        exname = exname.name
                    sent_event.update({
                        'queue': qname,
                        'exchange': exname,
                        'routing_key': routing_key,
                    })
                    evd.publish('task-sent', sent_event,
                                producer, retry=retry, retry_policy=retry_policy)
                return ret
            return send_task_message
    

    此时堆栈为:

    send_task_message, amqp.py:473
    send_task, base.py:749
    apply_async, task.py:565
    <module>, myclient.py:4
    

    此时变量为:

    qname = {str} 'celery'
    queue = {Queue} <unbound Queue celery -> <unbound Exchange celery(direct)> -> celery>
     ContentDisallowed = {type} <class 'kombu.exceptions.ContentDisallowed'>
     alias = {NoneType} None
     attrs = {tuple: 18} (('name', None), ('exchange', None), ('routing_key', None), ('queue_arguments', None), ('binding_arguments', None), ('consumer_arguments', None), ('durable', <class 'bool'>), ('exclusive', <class 'bool'>), ('auto_delete', <class 'bool'>), ('no_ack', None), ('alias', None), ('bindings', <class 'list'>), ('no_declare', <class 'bool'>), ('expires', <class 'float'>), ('message_ttl', <class 'float'>), ('max_length', <class 'int'>), ('max_length_bytes', <class 'int'>), ('max_priority', <class 'int'>))
     auto_delete = {bool} False
     binding_arguments = {NoneType} None
     bindings = {set: 0} set()
     can_cache_declaration = {bool} True
     channel = {str} 'Traceback (most recent call last):
      File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.2.2\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_resolver.py", line 178, in _getPyDictionary
        attr = getattr(var, n)
      File "C:\User
     consumer_arguments = {NoneType} None
     durable = {bool} True
     exchange = {Exchange} Exchange celery(direct)
     exclusive = {bool} False
     expires = {NoneType} None
     is_bound = {bool} False
     max_length = {NoneType} None
     max_length_bytes = {NoneType} None
     max_priority = {NoneType} None
     message_ttl = {NoneType} None
     name = {str} 'celery'
     no_ack = {bool} False
     no_declare = {NoneType} None
     on_declared = {NoneType} None
     queue_arguments = {NoneType} None
     routing_key = {str} 'celery'
      _channel = {NoneType} None
      _is_bound = {bool} False
    queues = {Queues: 1} {'celery': <unbound Queue celery -> <unbound Exchange celery(direct)> -> celery>}
    

    此时逻辑如下:

             1  apply_async       +-------------------+
                                  |                   |
    User  +---------------------> | task: myTest.add  |
                                  |                   |
                                  +--------+----------+
                                           |
                                           |
                              2 send_task  |
                                           |
                                           v
                                    +------+--------+
                                    | Celery myTest |
                                    |               |
                                    +------+--------+
                                           |
                                           |
                      3 send_task_message  |
                                           |
                                           v
                                   +-------+---------+
                                   |      amqp       |
                                   +-------+---------+
                                           |
                                           |
                                4 publish  |
                                           |
                                           v
                                      +----+------+
                                      | producer  |
                                      |           |
                                      +-----------+
    

    4.5 publish in producer

    在 produer 之中,调用 channel 来发送信息

    def _publish(self, body, priority, content_type, content_encoding,
                 headers, properties, routing_key, mandatory,
                 immediate, exchange, declare):
        channel = self.channel
        message = channel.prepare_message(
            body, priority, content_type,
            content_encoding, headers, properties,
        )
        if declare:
            maybe_declare = self.maybe_declare
            [maybe_declare(entity) for entity in declare]
    
        # handle autogenerated queue names for reply_to
        reply_to = properties.get('reply_to')
        if isinstance(reply_to, Queue):
            properties['reply_to'] = reply_to.name
        return channel.basic_publish( # 发送消息
            message,
            exchange=exchange, routing_key=routing_key,
            mandatory=mandatory, immediate=immediate,
        )
    

    变量为:

    body = {str} '[[2, 8], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]'
    compression = {NoneType} None
    content_encoding = {str} 'utf-8'
    content_type = {str} 'application/json'
    declare = {list: 1} [<unbound Queue celery -> <unbound Exchange celery(direct)> -> celery>]
    delivery_mode = {int} 2
    exchange = {str} ''
    exchange_name = {str} ''
    expiration = {NoneType} None
    headers = {dict: 15} {'lang': 'py', 'task': 'myTest.add', 'id': 'af0e4c14-a618-41b4-9340-1479cb7cde4f', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'af0e4c14-a618-41b4-9340-1479cb7cde4f', 'parent_id': None, 'argsrepr': '(2, 8)', 'kwargsrepr': '{}', 'origin': 'gen11468@DESKTOP-0GO3RPO'}
    immediate = {bool} False
    mandatory = {bool} False
    priority = {int} 0
    properties = {dict: 3} {'correlation_id': 'af0e4c14-a618-41b4-9340-1479cb7cde4f', 'reply_to': '2c938063-64b8-35f5-ac9f-a1c0915b6f71', 'delivery_mode': 2}
    retry = {bool} True
    retry_policy = {dict: 4} {'max_retries': 3, 'interval_start': 0, 'interval_max': 1, 'interval_step': 0.2}
    routing_key = {str} 'celery'
    self = {Producer} <Producer: <promise: 0x1eeb62c44c8>>
    serializer = {str} 'json'
    

    此时逻辑为:

             1  apply_async       +-------------------+
                                  |                   |
    User  +---------------------> | task: myTest.add  |
                                  |                   |
                                  +--------+----------+
                                           |
                              2 send_task  |
                                           |
                                           v
                                    +------+--------+
                                    | Celery myTest |
                                    |               |
                                    +------+--------+
                                           |
                      3 send_task_message  |
                                           |
                                           v
                                   +-------+---------+
                                   |      amqp       |
                                   +-------+---------+
                                           |
                                4 publish  |
                                           |
                                           v
                                      +----+------+
                                      | producer  |
                                      |           |
                                      +----+------+
                                           |
                                           |
                          5 basic_publish  |
                                           v
                                      +----+------+
                                      |  channel  |
                                      |           |
                                      +-----------+
    

    4.6 Redis Client

    Celery 最后是调用到 Redis Client 完成发送,堆栈如下:

    _put, redis.py:793
    basic_publish, base.py:605
    _publish, messaging.py:200
    _ensured, connection.py:525
    publish, messaging.py:178
    send_task_message, amqp.py:532
    send_task, base.py:749
    apply_async, task.py:565
    <module>, myclient.py:4
    

    具体代码对于:

    def lpush(self, name, *values):
        "Push ``values`` onto the head of the list ``name``"
        return self.execute_command('LPUSH', name, *values)
      
    # COMMAND EXECUTION AND PROTOCOL PARSING
    def execute_command(self, *args, **options):
        "Execute a command and return a parsed response"
        pool = self.connection_pool
        command_name = args[0]
        conn = self.connection or pool.get_connection(command_name, **options)
        try:
            conn.send_command(*args)
            return self.parse_response(conn, command_name, **options)
        except (ConnectionError, TimeoutError) as e:
            conn.disconnect()
            if not (conn.retry_on_timeout and isinstance(e, TimeoutError)):
                raise
            conn.send_command(*args)
            return self.parse_response(conn, command_name, **options)
        finally:
            if not self.connection:
                pool.release(conn)
    

    变量如下:

    args = {tuple: 3} ('LPUSH', 'celery', '{"body": "W1syLCAxN10sIHt9LCB7ImNhbGxiYWNrcyI6IG51bGwsICJlcnJiYWNrcyI6IG51bGwsICJjaGFpbiI6IG51bGwsICJjaG9yZCI6IG51bGx9XQ==", "content-encoding": "utf-8", "content-type": "application/json", "headers": {"lang": "py", "task": "myTest.add", "id": "59bc4efc-df32-49cc-86ca-fcee613369f9", "shadow": null, "eta": null, "expires": null, "group": null, "group_index": null, "retries": 0, "timelimit": [null, null], "root_id": "59bc4efc-df32-49cc-86ca-fcee613369f9", "parent_id": null, "argsrepr": "(2, 17)", "kwargsrepr": "{}", "origin": "gen18117@me2koreademini"}, "properties": {"correlation_id": "59bc4efc-df32-49cc-86ca-fcee613369f9", "reply_to": "d8e56fd1-ef27-3181-bc29-b9fb63f4dbb7", "delivery_mode": 2, "delivery_info": {"exchange": "", "routing_key": "celery"}, "priority": 0, "body_encoding": "base64", "delivery_tag": "7ff8c477-8ee7-4e71-9e88-e0c4ffc32943"}}')
    
    options = {dict: 0} {}
    
    self = {Redis} Redis<ConnectionPool<Connection<host=localhost,port=6379,db=0>>>
    

    至此一个任务就发送出去,等待着消费者消费掉任务。

    一个最终流程图如下:

                    apply_async              send_task              create_task_message
     +-----------+                +------+              +---------+                     +------+
     | user func +--------------> | task | +----------->+  Celery | +-----------------> | amqp |
     +-----------+                +------+              +---------+                     +--+---+
                                                                                           |
                                                                        send_task_message  |
                                                                                           |
                                                                                           v
                                        lpush           +---------+                 +------+---+
                                     +----------------+ | Channel |  <------------+ | Producer |
                                     |                  +---------+                 +----------+
                                     |                               basic_publish
                                     |
    +------------------------------------------------------------------------------------------+
                                     |
                                     |                                              Redis Client
                                     v
     +-------------------------------+----------------------------------+
     |                                                                  |
     | Redis<ConnectionPool<Connection<host=localhost,port=6379,db=0<>> |
     |                                                                  |
     +-------------------------------+----------------------------------+
                                     |
                                     |  send_command
                                     |
                                     v
             +-----------------------+---------------------------+
             |  Redis Connection<host=localhost,port=6379,db=0>  |
             +---------------------------------------------------+
    
    

    4.7 redis 内容

    发送之后,task 就被存储在redis的队列之中。在redis 的结果是:

    127.0.0.1:6379> keys *
    1) "_kombu.binding.reply.testMailbox.pidbox"
    2) "_kombu.binding.testMailbox.pidbox"
    3) "celery"
    4) "_kombu.binding.celeryev"
    5) "_kombu.binding.celery"
    6) "_kombu.binding.reply.celery.pidbox"
    127.0.0.1:6379> lrange celery 0 -1
    1) "{"body": "W1syLCA4XSwge30sIHsiY2FsbGJhY2tzIjogbnVsbCwgImVycmJhY2tzIjogbnVsbCwgImNoYWluIjogbnVsbCwgImNob3JkIjogbnVsbH1d", "content-encoding": "utf-8", "content-type": "application/json", "headers": {"lang": "py", "task": "myTest.add", "id": "243aac4a-361b-4408-9e0c-856e2655b7b5", "shadow": null, "eta": null, "expires": null, "group": null, "group_index": null, "retries": 0, "timelimit": [null, null], "root_id": "243aac4a-361b-4408-9e0c-856e2655b7b5", "parent_id": null, "argsrepr": "(2, 8)", "kwargsrepr": "{}", "origin": "gen33652@DESKTOP-0GO3RPO"}, "properties": {"correlation_id": "243aac4a-361b-4408-9e0c-856e2655b7b5", "reply_to": "b34fcf3d-da9a-3717-a76f-44b6a6362da1", "delivery_mode": 2, "delivery_info": {"exchange": "", "routing_key": "celery"}, "priority": 0, "body_encoding": "base64", "delivery_tag": "fa1bc9c8-3709-4c02-9543-8d0fe3cf4e6c"}}"
    

    4.7.1 delivery_tag 作用

    可以看到,最终消息中,有一个 delivery_tag 变量,这里要特殊说明下。

    可以认为 delivery_tag 是消息在 redis 之中的唯一标示,是 UUID 格式。

    具体举例如下:

    "delivery_tag": "fa1bc9c8-3709-4c02-9543-8d0fe3cf4e6c"

    后续 QoS 就使用 delivery_tag 来做各种处理,比如 ack, snack。

    with self.pipe_or_acquire() as pipe:
        pipe.zadd(self.unacked_index_key, *zadd_args) 
            .hset(self.unacked_key, delivery_tag,
                  dumps([message._raw, EX, RK])) 
            .execute()
        super().append(message, delivery_tag)
    

    4.7.2 delivery_tag 何时生成

    我们关心的是在发送消息时候,何时生成 delivery_tag。

    结果发现是在 Channel 的 _next_delivery_tag 函数中,是在发送消息之前,对消息做了进一步增强。

    def _next_delivery_tag(self):
        return uuid()
    

    具体堆栈如下:

    _next_delivery_tag, base.py:595
    _inplace_augment_message, base.py:614
    basic_publish, base.py:599
    _publish, messaging.py:200
    _ensured, connection.py:525
    publish, messaging.py:178
    send_task_message, amqp.py:532
    send_task, base.py:749
    apply_async, task.py:565
    <module>, myclient.py:4
    

    至此,客户端发送 task 的流程已经结束,有兴趣的同学可以再看看 [源码解析] 并行分布式任务队列 Celery 之 消费动态流程 此文从服务端角度讲解收到 Task 如何消费。

    0xFF 参考

    celery源码分析-Task的初始化与发送任务

    Celery 源码解析三: Task 对象的实现

    分布式任务队列 Celery —— 详解工作流

  • 相关阅读:
    面向对象
    linux下apache重启报错
    mysql登录密码忘记怎么办?
    html基础知识梳理
    用js实现贪吃蛇
    简单轮播图案例
    JavaScript基础学习笔记整理
    读书笔记之《Redis开发与运维》—— 三
    读书笔记之《Redis开发与运维》—— 二
    读书笔记之《Redis开发与运维》—— 一
  • 原文地址:https://www.cnblogs.com/rossiXYZ/p/14672090.html
Copyright © 2020-2023  润新知