• Python进阶:并发编程之Futures


    区分并发和并行

      并发(Concurrency).

      由于Python 的解释器并不是线程安全的,为了解决由此带来的 race condition 等问题,Python 便引入了全局解释器锁,也就是同一时刻,只允许一个线程执行。当然,在执行 I/O 操作时,如果一个线程被 block 了,全局解释器锁便会被释放,从而让另一个线程能够继续执行。所以在Python中,并发并不是指同一时刻有多个操作(thread、task)同时进行,而是同一时刻,只允许有一个线程或任务执行。
      

      并行(Parallelism)

      指多个进程完全同步同时的执行。
      
     

    并发编程之 Futures

      单线程与多线程性能比较

      假设我们有一个任务,是下载一些网站的内容并打印。如果用单线程的方式,它的代码实现如下所示
    import requests
    import time
    
    def download_one(url):
        resp = requests.get(url)
        print('Read {} from {}'.format(len(resp.content), url))
        
    def download_all(sites):
        for site in sites:
            download_one(site)
    
    def main():
        sites = [
            'https://en.wikipedia.org/wiki/Portal:Arts',
            'https://en.wikipedia.org/wiki/Portal:History',
            'https://en.wikipedia.org/wiki/Portal:Society',
            'https://en.wikipedia.org/wiki/Portal:Biography',
            'https://en.wikipedia.org/wiki/Portal:Mathematics',
            'https://en.wikipedia.org/wiki/Portal:Technology',
            'https://en.wikipedia.org/wiki/Portal:Geography',
            'https://en.wikipedia.org/wiki/Portal:Science',
            'https://en.wikipedia.org/wiki/Computer_science',
            'https://en.wikipedia.org/wiki/Python_(programming_language)',
            'https://en.wikipedia.org/wiki/Java_(programming_language)',
            'https://en.wikipedia.org/wiki/PHP',
            'https://en.wikipedia.org/wiki/Node.js',
            'https://en.wikipedia.org/wiki/The_C_Programming_Language',
            'https://en.wikipedia.org/wiki/Go_(programming_language)'
        ]
        start_time = time.perf_counter()
        download_all(sites)
        end_time = time.perf_counter()
        print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
        
    if __name__ == '__main__':
        main()
    
    # 输出
    Read 129196 from https://en.wikipedia.org/wiki/Portal:Arts
    Read 183867 from https://en.wikipedia.org/wiki/Portal:History
    Read 224161 from https://en.wikipedia.org/wiki/Portal:Society
    Read 114387 from https://en.wikipedia.org/wiki/Portal:Biography
    Read 152871 from https://en.wikipedia.org/wiki/Portal:Mathematics
    Read 156339 from https://en.wikipedia.org/wiki/Portal:Technology
    Read 162872 from https://en.wikipedia.org/wiki/Portal:Geography
    Read 91504 from https://en.wikipedia.org/wiki/Portal:Science
    Read 323262 from https://en.wikipedia.org/wiki/Computer_science
    Read 391073 from https://en.wikipedia.org/wiki/Python_(programming_language)
    Read 319710 from https://en.wikipedia.org/wiki/Java_(programming_language)
    Read 470754 from https://en.wikipedia.org/wiki/PHP
    Read 180774 from https://en.wikipedia.org/wiki/Node.js
    Read 56799 from https://en.wikipedia.org/wiki/The_C_Programming_Language
    Read 325451 from https://en.wikipedia.org/wiki/Go_(programming_language)
    Download 15 sites in 67.349395015 seconds
      以上代码的流程:先是遍历存储网站的列表; 然后对当前网站执行下载操作;等到当前操作完成后,再对下一个网站进行同样的操作,一直到结束。
      接下来看多线程版本
    import concurrent.futures
    import requests
    import threading
    import time
    
    def download_one(url):
        try:
            resp = requests.get(url)
            print('Read {} from {}'.format(len(resp.content), url))
        except Exception as ex:
            print(ex)
    
    def download_all(sites):
        with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
            results = executor.map(download_one, sites)
        # with concurrent.futures.ProcessPoolExecutor() as executor:
        #     results = executor.map(download_one,sites)
    
    def main():
        sites = [
            'https://en.wikipedia.org/wiki/Portal:Arts',
            'https://en.wikipedia.org/wiki/Portal:History',
            'https://en.wikipedia.org/wiki/Portal:Society',
            'https://en.wikipedia.org/wiki/Portal:Biography',
            'https://en.wikipedia.org/wiki/Portal:Mathematics',
            'https://en.wikipedia.org/wiki/Portal:Technology',
            'https://en.wikipedia.org/wiki/Portal:Geography',
            'https://en.wikipedia.org/wiki/Portal:Science',
            'https://en.wikipedia.org/wiki/Computer_science',
            'https://en.wikipedia.org/wiki/Python_(programming_language)',
            'https://en.wikipedia.org/wiki/Java_(programming_language)',
            'https://en.wikipedia.org/wiki/PHP',
            'https://en.wikipedia.org/wiki/Node.js',
            'https://en.wikipedia.org/wiki/The_C_Programming_Language',
            'https://en.wikipedia.org/wiki/Go_(programming_language)'
        ]
        start_time = time.perf_counter()
        download_all(sites)
        end_time = time.perf_counter()
        print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
    
    if __name__ == '__main__':
        main()
    
    # 输出
    Read 114387 from https://en.wikipedia.org/wiki/Portal:Biography
    Read 129196 from https://en.wikipedia.org/wiki/Portal:Arts
    Read 183867 from https://en.wikipedia.org/wiki/Portal:History
    Read 152871 from https://en.wikipedia.org/wiki/Portal:Mathematics
    Read 224161 from https://en.wikipedia.org/wiki/Portal:Society
    Read 156339 from https://en.wikipedia.org/wiki/Portal:Technology
    Read 91504 from https://en.wikipedia.org/wiki/Portal:Science
    Read 391073 from https://en.wikipedia.org/wiki/Python_(programming_language)
    Read 162872 from https://en.wikipedia.org/wiki/Portal:Geography
    Read 323262 from https://en.wikipedia.org/wiki/Computer_science
    Read 56799 from https://en.wikipedia.org/wiki/The_C_Programming_Language
    Read 319710 from https://en.wikipedia.org/wiki/Java_(programming_language)
    Read 325451 from https://en.wikipedia.org/wiki/Go_(programming_language)
    Read 180774 from https://en.wikipedia.org/wiki/Node.js
    Read 470754 from https://en.wikipedia.org/wiki/PHP
    Download 15 sites in 10.022916933 seconds
      以上代码效率提高了6倍。使用ThreadPoolExecutor创建了一个线程池,max_workers分配了5个线程,executor.map(download_one, sites)对sites的元素并发的调用download_one函数。其中requests.get()方法是线程安全的(thread-safe),在多线程环境中可以安全地使用。线程的数量虽可以自定,但过多的线程会造成系统的开销增大。可以根据实际需求做测试,寻找最优线程数量。
      以上代码也可以用并行的方法来实现。在download_all()函数中:
    with futures.ThreadPoolExecutor(workers) as executor
    =>
    with futures.ProcessPoolExecutor() as executor: 

      对于这种IO场景,用并行的方式并不会比并发的方式效率高.

    到底什么是 Futures ?

       Python 中的 Futures 模块,位于 concurrent.futures 和 asyncio 中,它们都表示带有延迟的操作。Futures 会将处于等待状态的操作包裹起来放到队列中,这些操作的状态随时可以查询,当然,它们的结果或是异常,也能够在操作完成后被获取。
    import concurrent.futures
    import requests
    import time
    
    def download_one(url):
        resp = requests.get(url)
        print('Read {} from {}'.format(len(resp.content), url))
        return f'download {len(resp.content)} ok'
    
    # def over(arg):
    #     print(arg)
    #     print('over')
    
    def download_all(sites):
        #future列表中每个future完成的顺序,和它在列表中的顺序并不一定完全一致。
        #到底哪个先完成、哪个后完成,取决于系统的调度和每个future的执行时间
        with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
            to_do = []
            for site in sites:
                #executor.submit返回future实例
                future = executor.submit(download_one, site)
                to_do.append(future)
                #future.add_done_callback(over)
            
            #在futures完成后打印结果
            for future in concurrent.futures.as_completed(to_do):
                if future.exception() is not None:
                    print(future.exception())
                else:
                    print(future.result())
    
    def main():
        sites = [
            'https://en.wikipedia.org/wiki/Portal:Arts',
            'https://en.wikipedia.org/wiki/Portal:History',
            'https://en.wikipedia.org/wiki/Portal:Society',
            'https://en.wikipedia.org/wiki/Portal:Biography',
            'https://en.wikipedia.org/wiki/Portal:Mathematics',
            'https://en.wikipedia.org/wiki/Portal:Technology',
            'https://en.wikipedia.org/wiki/Portal:Geography',
            'https://en.wikipedia.org/wiki/Portal:Science',
            'https://en.wikipedia.org/wiki/Computer_science',
            'https://en.wikipedia.org/wiki/Python_(programming_language)',
            'https://en.wikipedia.org/wiki/Java_(programming_language)',
            'https://en.wikipedia.org/wiki/PHP',
            'https://en.wikipedia.org/wiki/Node.js',
            'https://en.wikipedia.org/wiki/The_C_Programming_Language',
            'https://en.wikipedia.org/wiki/Go_(programming_language)'
        ]
        start_time = time.perf_counter()
        download_all(sites)
        end_time = time.perf_counter()
        print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
    
    if __name__ == '__main__':
        main()
    
    # 输出
    Read 129886 from https://en.wikipedia.org/wiki/Portal:Arts
    Read 107634 from https://en.wikipedia.org/wiki/Portal:Biography
    Read 224118 from https://en.wikipedia.org/wiki/Portal:Society
    Read 158984 from https://en.wikipedia.org/wiki/Portal:Mathematics
    Read 184343 from https://en.wikipedia.org/wiki/Portal:History
    Read 157949 from https://en.wikipedia.org/wiki/Portal:Technology
    Read 167923 from https://en.wikipedia.org/wiki/Portal:Geography
    Read 94228 from https://en.wikipedia.org/wiki/Portal:Science
    Read 391905 from https://en.wikipedia.org/wiki/Python_(programming_language)
    Read 321352 from https://en.wikipedia.org/wiki/Computer_science
    Read 180298 from https://en.wikipedia.org/wiki/Node.js
    Read 321417 from https://en.wikipedia.org/wiki/Java_(programming_language)
    Read 468421 from https://en.wikipedia.org/wiki/PHP
    Read 56765 from https://en.wikipedia.org/wiki/The_C_Programming_Language
    Read 324039 from https://en.wikipedia.org/wiki/Go_(programming_language)
    Download 15 sites in 0.21698231499976828 seconds

      future列表中每个future完成的顺序,和它在列表中的顺序并不一定完全一致。到底哪个先完成、哪个后完成,取决于系统的调度和每个future的执行时间。

      并发通常用于 I/O 操作频繁的场景,而并行则适用于 CPU heavy 的场景。

     参考

       极客时间《Python核心技术与实战》专栏

  • 相关阅读:
    HDU1285-确定比赛名次(拓扑排序)
    ftp sftp
    Python with 用法
    odoo 非root用户运行不成功
    linux 删除软连接
    vscode wsl php
    WSL 修改默认登录用户为root
    WSL ssh服务自启动
    odoo 获取model的所有字段
    odoo 在"动作"("Action")菜单中添加子菜单, 点击子菜单弹窗自定义form
  • 原文地址:https://www.cnblogs.com/xiaoguanqiu/p/11136665.html
Copyright © 2020-2023  润新知