• python 3.7 协程


    
    协程:
    
    协程是实现并发编程的一种方式。一说并发,你肯定想到了多线程 / 多进程模型,没错,多线程 / 多进程
    
    
    node2:/root/python/20200524#cat t1.py 
    import time
    
    def crawl_page(url):
        print('crawling {}'.format(url))
        sleep_time = int(url.split('_')[-1])
        time.sleep(sleep_time)
        print('OK {}'.format(url))
    
    def main(urls):
        for url in urls:
            nowTime = time.strftime("%Y-%m-%d %H:%M:%S")
            print(nowTime)
            crawl_page(url)
            print(nowTime)
    
    main(['url_1', 'url_2', 'url_3', 'url_4'])
    
    node2:/root/python/20200524#time python t1.py 
    2020-04-16 22:50:45
    crawling url_1
    OK url_1
    2020-04-16 22:50:45
    
    2020-04-16 22:50:46
    crawling url_2
    OK url_2
    2020-04-16 22:50:46
    
    2020-04-16 22:50:48
    crawling url_3
    OK url_3
    2020-04-16 22:50:48
    
    2020-04-16 22:50:51
    crawling url_4
    OK url_4
    2020-04-16 22:50:51
    
    real	0m10.043s
    user	0m0.022s
    sys	0m0.009s
    
    
    于是,一个很简单的思路出现了——我们这种爬取操作,完全可以并发化。我们就来看看使用协程怎么写
    
    
    import asyncio
    
    async def crawl_page(url):
        print('crawling {}'.format(url))
        sleep_time = int(url.split('_')[-1])
        await asyncio.sleep(sleep_time)
        print('OK {}'.format(url))
    
    async def main(urls):
        for url in urls:
            await crawl_page(url)
    
    %time asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
    
    ########## 输出 ##########
    
    crawling url_1
    OK url_1
    crawling url_2
    OK url_2
    crawling url_3
    OK url_3
    crawling url_4
    OK url_4
    Wall time: 10 s
    
    
    node2:/root/python/20200524#time python3 t2.py 
    2020-04-16 23:42:52
    crawling url_1
    OK url_1
    2020-04-16 23:42:53
    2020-04-16 23:42:53
    crawling url_2
    OK url_2
    2020-04-16 23:42:55
    2020-04-16 23:42:55
    crawling url_3
    OK url_3
    2020-04-16 23:42:58
    2020-04-16 23:42:58
    crawling url_4
    OK url_4
    2020-04-16 23:43:02
    
    real	0m10.095s
    user	0m0.070s
    sys	0m0.014s
    10 秒就对了,还记得上面所说的,await 是同步调用,因此, crawl_page(url) 在当前的调用结束之前,是不会触发下一次调用的。于是,这个代码效果就和上面完全一样了,相当于我们用异步接口写了个同步代码。
    
    node2:/root/python/20200524#cat t3.py 
    
    import asyncio
    
    async def crawl_page(url):
        print('crawling {}'.format(url))
        sleep_time = int(url.split('_')[-1])
        await asyncio.sleep(sleep_time)
        print('OK {}'.format(url))
    
    async def main(urls):
        tasks = [asyncio.create_task(crawl_page(url)) for url in urls]
        for task in tasks:
            await task
    
    asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4']))
    
    node2:/root/python/20200524#time python3 t3
    python3: can't open file 't3': [Errno 2] No such file or directory
    
    real	0m0.027s
    user	0m0.020s
    sys	0m0.007s
    node2:/root/python/20200524#time python3 t3.py
    crawling url_1
    crawling url_2
    crawling url_3
    crawling url_4
    OK url_1
    OK url_2
    OK url_3
    OK url_4
    
    real	0m4.090s
    user	0m0.034s
    sys	0m0.052s
    
  • 相关阅读:
    2021 RoboCom 世界机器人开发者大赛-本科组(初赛)7-1 懂的都懂 (20 分)
    PTA 乙级 1080 MOOC期终成绩 (25 分) C++
    PTA 乙级 1079 延迟的回文数 (20 分) C++
    PTA 乙级 1078 字符串压缩与解压 (20 分) C++
    PTA 乙级 1077 互评成绩计算 (20 分) C++
    PTA 乙级 1076 Wifi密码 (15 分) python
    PTA 乙级 1075 链表元素分类 (25 分) C++
    Hadoop 代码实现文件上传
    Django学习笔记十---FBV视图--003篇---获取请求信息
    Django学习笔记十---FBV视图--002篇---设置重定向和异常响应
  • 原文地址:https://www.cnblogs.com/hzcya1995/p/13348363.html
Copyright © 2020-2023  润新知