concurrent.futures
concurrent.futures提供高层次的接口,用来实现异步调用。
这个异步执行可以使用threads(ThreadPoolExecutor)或者process(ProcessPoolExecutor)
这个feautre是Python3.2后的新功能,但是也支持Python2。
需要安装futures模块,https://pypi.python.org/pypi/futures/2.1.4
【例子1】非并发的例子
#!/usr/bin/env python2.6 from Queue import Queue import random import time q = Queue() fred = [1,2,3,4,5,6,7,8,9,10] def f(x): if random.randint(0,1): time.sleep(0.1) # res = x * x q.put(res) def main(): for num in fred: f(num) # while not q.empty(): print q.get() if __name__ == "__main__": main()
【例子2】使用ThreadPoolExecutor
#!/usr/bin/env python2.7 from Queue import Queue import concurrent.futures import random import time q = Queue() fred = [1,2,3,4,5,6,7,8,9,10] def f(x): if random.randint(0,1): time.sleep(0.1) # res = x * x q.put(res) def main(): with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor: for num in fred: executor.submit(f, num) # while not q.empty(): print q.get() #################### if __name__ == "__main__": main()
使用线程池中4个workers处理所有job。
with的语句保证所有线程都执行完成后,再进行下面的操作。
结果保持在一个队列中,队列是线程安全的。
. “The Queue module implements multi-producer, multi-consumer queues. It is especially useful in threaded programming when information must be exchanged safely between multiple threads. The Queue class in this module implements all the required locking semantics.“
队列模块实现多个生产者,多个消费者模式。特别在多线程之间进行信息交换的场景下最长使用。在这个模块下Queue类实现了所有需要的锁信息。
【例子3】使用ProcessPoolExecutor
“The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.“
ProcessPoolExecute是Executor的子类,使用进程池实现异步调用。ProcessPoolExecute使用多进程模块,允许规避 Global Interpreter Lock,但是只有处理和返回picklable的对象。
#!/usr/bin/env python2.7 import sys import redis import concurrent.futures r = redis.Redis() fred = [1,2,3,4,5,6,7,8,9,10] def check_server(): try: r.info() except redis.exceptions.ConnectionError: print >>sys.stderr, "Error: cannot connect to redis server. Is the server running?" sys.exit(1) def f(x): res = x * x r.rpush("test", res) def main(): with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor: for num in fred: executor.submit(f, num) # print r.lrange("test", 0, -1) #################### if __name__ == "__main__": check_server() ### r.delete("test") main()
使用到redis链表的数据结构
Queue is not a good choice here because we are using processes here, and Queue is made for threads.
Queue不是一个好的选择,因为这里使用process。Queue是为线程准备的。
所以这里将结果存储在redis的list中,redis: getting started
在redis中所有的操作都是原子的,因此对于不同的进程可以安全写入相关的结果。
【测试】
1、把源数据设置为range(1,1000)之后,测试效果如下:
[root@typhoeus79 20140811]# time ./basic.py real 0m49.388s user 0m0.024s sys 0m0.013s [root@typhoeus79 20140811]# time ./thread.py real 0m12.687s user 0m0.103s sys 0m0.061s [root@typhoeus79 20140811]# time ./process.py real 0m0.507s user 0m0.557s sys 0m0.343s
【适应场景】
Threads are good for I/O tasks, while processes are good for CPU-bound tasks.
【Executor】
class concurrent.futures.Executor An abstract class that provides methods to execute calls asynchronously. It should not be used directly, but through its concrete subclasses
Executor是一个抽象的类,提供执行异步调用的方法。不能直接调用,而是通过具体的子类来调用。
ThreadPoolExecutor和ProcessPoolExecutor都是其的子类。
submit(fn, *args, **kwargs) Schedules the callable, fn, to be executed as fn(*args **kwargs) and returns a Future object representing the execution of the callable.
执行函数fn(*args,**kwargs),返回一个Future对象,代表可调用的执行。
>>> with ThreadPoolExecutor(max_workers=1) as executor: ... future = executor.submit(pow, 323, 1235) ... print(future) ... <Future at 0x7f1e7d053e10 state=finished returned long>
#打印结果 >>> with ThreadPoolExecutor(max_workers=1) as executor: ... future = executor.submit(pow, 323, 1235) ... print(future.result())
map(func, *iterables, timeout=None) Equivalent to map(func, *iterables) except func is executed asynchronously and several calls to func may be made concurrently. The returned iterator raises a TimeoutError if __next__() is called and the result isn’t available after timeout seconds from the original call to Executor.map(). timeout can be an int or a float. If timeout is not specified or None, there is no limit to the wait time. If a call raises an exception, then that exception will be raised when its value is retrieved from the iterator.
并发执行func,参数为iterables指定。timeout可以指定为int或者float类型,如果没有指定或者None,则无限等待。如果触发异常,当从iterator获取值的时候,这个异常将被捕获。
shutdown(wait=True) Signal the executor that it should free any resources that it is using when the currently pending futures are done executing. Calls to Executor.submit() and Executor.map() made after shutdown will raise RuntimeError.
释放资源使用。
使用with语句,避免该函数的调用,with语句会关闭所有的Executor。
>>> with ThreadPoolExecutor(max_workers=4) as e: ... e.submit(shutil.copy, 'src1.txt', 'dest1.txt') ... e.submit(shutil.copy, 'src2.txt', 'dest2.txt') ... e.submit(shutil.copy, 'src3.txt', 'dest3.txt') ... e.submit(shutil.copy, 'src3.txt', 'dest4.txt') ... <Future at 0x7f1e79191250 state=running> <Future at 0x7f1e79191450 state=finished raised IOError> <Future at 0x7f1e79191250 state=running> <Future at 0x7f1e79191450 state=finished raised IOError>
【参考文献】
1、https://pythonadventures.wordpress.com/tag/threadpoolexecutor/
2、https://docs.python.org/dev/library/concurrent.futures.html#module-concurrent.futures