• python之multiprocess模块(下)


    进程间状态信息

    同样的,Event类可以在进程之间传递状态信息。事件可以在设置状态和未设置状态之间切换。还可以添加一个可选的超时值,超时后状态可以从未设置变为设置。

      1: import multiprocessing
    
      2: import time
    
      3: def wait_for_event(e):
    
      4:     print("wait for event:starting")
    
      5:     e.wait()
    
      6:     print("wait for event:e_is_set()->", e.is_set())
    
      7: def wait_for_event_timeout(e, t):
    
      8:     print("wait_for_event_timeout:starting")
    
      9:     e.wait(t)
    
     10:     print("wait_for_event_timeout:e.is_set()->", e.is_set())
    
     11: if __name__ == '__main__':
    
     12:     e = multiprocessing.Event()
    
     13:     w1 = multiprocessing.Process(
    
     14:         name="block",
    
     15:         target=wait_for_event,
    
     16:         args=(e, )
    
     17:     )
    
     18:     w1.start()
    
     19:     w2 = multiprocessing.Process(
    
     20:         name="nonblock",
    
     21:         target=wait_for_event_timeout,
    
     22:         args=(e, 2)
    
     23:     )
    
     24:     w2.start()
    
     25:     print("main:waiting before calling Event.set()")
    
     26:     time.sleep(3)
    
     27:     e.set()
    
     28:     print("main:event is set")

    结果,wait(t)到时间就会返回。也可以e.set()直接更改设置。

      1: main:waiting before calling Event.set()
    
      2: wait for event:starting
    
      3: wait_for_event_timeout:starting
    
      4: wait_for_event_timeout:e.is_set()-> False
    
      5: main:event is set
    
      6: wait for event:e_is_set()-> True

    控制资源的访问

    如果需要在多个进程间共享一个资源,可以使用一个Lock锁来避免访问冲突。

    使用的api如下:

    lock = multiprocessing.Lock() 实例化一个锁对象

    lock.acquire() 加锁

    lock.release() 释放锁

    with lock: # 拿到锁执行代码后并释放锁,注意不要嵌套。

        # 业务代码…

    但是锁的问题比较复杂并且效率低,所有我们一般避免使用共享的数据,而是使用消息传递和队列(Queue)

    同步操作

    Condition对象可以来同步一个工作流的各个部分。使其中一部分并行运行,另外一些顺序运行,即使它们在不同的进程中。

    下面是一个简单的例子:

      1: import multiprocessing
    
      2: import time
    
      3: def stage_1(cond):
    
      4:     name = multiprocessing.current_process().name
    
      5:     print("starting", name)
    
      6:     with cond:
    
      7:         print("{} done and ready for stage 2".format(name))
    
      8:         # 激活等待的进程
    
      9:         cond.notify_all()
    
     10: def stage_2(cond):
    
     11:     name = multiprocessing.current_process().name
    
     12:     print("starting", name)
    
     13:     with cond:
    
     14:         cond.wait()
    
     15:         print("{} running".format(name))
    
     16: if __name__ == '__main__':
    
     17:     condition = multiprocessing.Condition()
    
     18:     s1 = multiprocessing.Process(
    
     19:         name="s1",
    
     20:         target=stage_1,
    
     21:         args=(condition, ),
    
     22:     )
    
     23:     s2_client = [
    
     24:         multiprocessing.Process(
    
     25:         name="stage_2[{}]".format(i),
    
     26:         target=stage_2,
    
     27:         args=(condition, ),
    
     28:     ) for i in range(1, 3)]
    
     29:     for c in s2_client:
    
     30:         c.start()
    
     31:         time.sleep(1)
    
     32:     s1.start()
    
     33: 
    
     34:     s1.join()
    
     35:     for c in s2_client:
    
     36:         c.join()

    结果:(根据机器的配置结果有轻微的不同)

      1: starting stage_2[1]
    
      2: starting stage_2[2]
    
      3: starting s1
    
      4: s1 done and ready for stage 2
    
      5: stage_2[1] running
    
      6: stage_2[2] running

    控制资源的并发访问

    有时候允许多个进程同时访问一个资源,但是要限制总数。比如一个网络应用可能支持固定数目的并发下载。用Semaphore来管理这些连接.3是允许同时访问的最大进程数。

      1: s = multiprocess.Semaphore(3)
    
      2: jobs = [
    
      3:     multiprocess.Process(
    
      4:         target=worker,
    
      5:         name=str(i),
    
      6:         args=(s,)
    
      7: )
    
      8: for i in range(10)]

    管理共享状态

    管理器multiprocess.Manager()除了支持字典之外,还支持列表

      1: import multiprocessing
    
      2: def worker(d, key, value):
    
      3:     mgr = multiprocessing.Manager()
    
      4:     d[key] = value
    
      5: if __name__ == '__main__':
    
      6:     mgr = multiprocessing.Manager()
    
      7:     d = mgr.dict()
    
      8:     jobs = [
    
      9:         multiprocessing.Process(
    
     10:             target=worker,
    
     11:             args=(d, i, i*2),
    
     12:         ) for i in range(10)
    
     13:     ]
    
     14:     for i in jobs:
    
     15:         i.start()
    
     16:     for j in jobs:
    
     17:         j.join()
    
     18:     print("D->", d)

    结果。由于这个列表是通过管理器创建的,所以它会由所有的进程共享。

      1: D-> {1: 2, 3: 6, 0: 0, 5: 10, 8: 16, 2: 4, 7: 14, 6: 12, 4: 8, 9: 18}

    共享命名空间

    namespace = mgr.Namespace()

    下面是一个简单的示例

      1: import multiprocessing
    
      2: def producer(ns, event):
    
      3:     ns.value = "this is a value"
    
      4:     event.set()
    
      5: def consumer(ns, event):
    
      6:     try:
    
      7:         print("Before event:{}".format(ns.value))
    
      8:     except Exception as err:
    
      9:         print("Before event error:", str(err))
    
     10:     event.wait()
    
     11:     print("After event:", ns.value)
    
     12: if __name__ == '__main__':
    
     13:     mgr = multiprocessing.Manager()
    
     14:     namespace = mgr.Namespace()
    
     15:     event = multiprocessing.Event()
    
     16:     p = multiprocessing.Process(
    
     17:         target=producer,
    
     18:         args=(namespace, event)
    
     19:     )
    
     20:     c = multiprocessing.Process(
    
     21:         target=consumer,
    
     22:         args=(namespace, event),
    
     23:     )
    
     24:     c.start()
    
     25:     p.start()
    
     26:     c.join()
    
     27:     p.join()

    结果:

      1: Before event error: 'Namespace' object has no attribute 'value'
    
      2: After event: this is a value

    对于命名空间中可变值内容的更新不会自动传播。如果需要更新要将它再次关联到命名空间对象。

    进程池

    Pool类可以管理固定数目的工作进程。

      1: import multiprocessing
    
      2: def do_calculation(data):
    
      3:     return data * 2
    
      4: def start_process():
    
      5:     print("starting", multiprocessing.current_process().name)
    
      6: if __name__ == '__main__':
    
      7:     inputs = list(range(10))
    
      8:     print("Input :", inputs)
    
      9:     builtin_outputs = list(map(do_calculation, inputs))
    
     10:     print("Built-in:", builtin_outputs)
    
     11:     pool_size = multiprocessing.cpu_count()*2
    
     12:     pool = multiprocessing.Pool(
    
     13:         processes=pool_size,
    
     14:         initializer=start_process,
    
     15:     )
    
     16:     pool_outputs = pool.map(do_calculation, inputs)
    
     17:     pool.close()
    
     18:     pool.join()
    
     19:     print("Pool:", pool_outputs)

    close与join让任务与主进程同步。结果:

      1: Input : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    
      2: Built-in: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
    
      3: starting SpawnPoolWorker-1
    
      4: starting SpawnPoolWorker-2
    
      5: starting SpawnPoolWorker-4
    
      6: starting SpawnPoolWorker-5
    
      7: starting SpawnPoolWorker-6
    
      8: starting SpawnPoolWorker-3
    
      9: starting SpawnPoolWorker-8
    
     10: starting SpawnPoolWorker-7
    
     11: Pool: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

    当然pool类也可由设置maxtasksperchild参数来告诉池对象,在完成一些任务之后要重新启动一个工作进程,来避免长时间运行的工作进程消耗更多的系统资源。

  • 相关阅读:
    leetcode — remove-duplicates-from-sorted-list
    leetcode — word-search
    leetcode — subsets-ii
    leetcode — subsets
    leetcode — combinations
    leetcode — minimum-window-substring
    leetcode — sort-colors
    leetcode — search-a-2d-matrix
    leetcode — set-matrix-zeroes
    bzoj 3261: 最大异或和 可持久化Trie
  • 原文地址:https://www.cnblogs.com/haoqirui/p/10335078.html
Copyright © 2020-2023  润新知