• urllib|requests爬取网页Ajax,以豆瓣电影为例


      python3中的urllib库和requests库的使用,这里主要介绍下什么是Ajax,以及对于网页Ajax的爬取,以豆瓣电影为例,分别用urllib库和requests库进行抓取。
      
      一、什么是Ajax?
      
      “Ajax 即“Asynchronous Javascript And XML”(异步 JavaScript 和 XML),是指一种创建交互式网页应用的网页开发技术。Ajax = 异步 JavaScript 和 XML(标准通用标记语言的子集)。Ajax 是一种用于创建快速动态网页的技术。Ajax 是一种在无需重新加载整个网页的情况下,能够更新部分网页的技术。通过在后台与服务器进行少量数据交换,Ajax 可以使网页实现异步更新。这意味着可以在不重新加载整个网页的情况下,对网页的某部分进行更新。传统的网页(不使用 Ajax)如果需要更新内容,必须重载整个网页页面。”

      二、urllib对于豆瓣电影Ajax的爬取:

    import urllib.request
    from urllib import parse
    
    #headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
      #       "Referer":"https://movie.douban.com/explore"}
    #fiddler抓包豆瓣,找到POST的请求网址
    url="https://movie.douban.com/j/search_subjects?type=movie&tag=%E5%8D%8E%E8%AF%AD&sort=recommend"
    formdata={"page_limit":"20","page_start":"0"}
    data=parse.urlencode(formdata)  #编码
    #print(data)
    request=urllib.request.Request(url,data=data.encode('utf-8'))  #post,发送请求,传递data
    
    #用add_headers()来添加headers
    request.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36")
    
    response=urllib.request.urlopen(request).read()   #打开请求 读取数据
    print(response.decode("utf-8"))

    运行结果如下:

    {"subjects":[{"rate":"9.0","cover_x":2810,"title":"我不是药神","url":"https://movie.douban.com/subject/26752088/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2561305376.jpg","id":"26752088","cover_y":3937,"is_new":false},{"rate":"8.5","cover_x":5594,"title":"哪吒之魔童降世","url":"https://movie.douban.com/subject/26794435/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2563780504.jpg","id":"26794435","cover_y":8268,"is_new":false},{"rate":"7.9","cover_x":1786,"title":"流浪地球","url":"https://movie.douban.com/subject/26266893/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2545472803.jpg","id":"26266893","cover_y":2500,"is_new":false},{"rate":"4.7","cover_x":5906,"title":"诛仙 Ⅰ","url":"https://movie.douban.com/subject/25779217/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2567346094.jpg","id":"25779217","cover_y":8268,"is_new":false},{"rate":"8.3","cover_x":5906,"title":"少年的你","url":"https://movie.douban.com/subject/30166972/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2572166063.jpg","id":"30166972","cover_y":8268,"is_new":false},{"rate":"6.5","cover_x":679,"title":"西虹市首富","url":"https://movie.douban.com/subject/27605698/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2529206747.jpg","id":"27605698","cover_y":950,"is_new":false},{"rate":"7.8","cover_x":5906,"title":"我和我的祖国","url":"https://movie.douban.com/subject/32659890/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2567998580.jpg","id":"32659890","cover_y":8268,"is_new":false},{"rate":"7.1","cover_x":1080,"title":"一出好戏","url":"https://movie.douban.com/subject/26985127/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2529571873.jpg","id":"26985127","cover_y":1512,"is_new":false},{"rate":"6.9","cover_x":7142,"title":"飞驰人生","url":"https://movie.douban.com/subject/30163509/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2542973862.jpg","id":"30163509","cover_y":10000,"is_new":false},{"rate":"8.1","cover_x":1429,"title":"无名之辈","url":"https://movie.douban.com/subject/27110296/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2539661066.jpg","id":"27110296","cover_y":2000,"is_new":false},{"rate":"6.7","cover_x":1286,"title":"中国机长","url":"https://movie.douban.com/subject/30295905/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2568258113.jpg","id":"30295905","cover_y":1800,"is_new":false},{"rate":"8.1","cover_x":1000,"title":"无双","url":"https://movie.douban.com/subject/26425063/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2535260806.jpg","id":"26425063","cover_y":1400,"is_new":false},{"rate":"6.4","cover_x":960,"title":"疯狂的外星人","url":"https://movie.douban.com/subject/25986662/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2541901817.jpg","id":"25986662","cover_y":1359,"is_new":false},{"rate":"6.0","cover_x":1080,"title":"囧妈","url":"https://movie.douban.com/subject/30306570/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2581835383.jpg","id":"30306570","cover_y":1542,"is_new":false},{"rate":"7.9","cover_x":5315,"title":"白蛇:缘起","url":"https://movie.douban.com/subject/30331149/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2544313786.jpg","id":"30331149","cover_y":7441,"is_new":false},{"rate":"7.2","cover_x":2999,"title":"动物世界","url":"https://movie.douban.com/subject/26925317/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2525528688.jpg","id":"26925317","cover_y":4181,"is_new":false},{"rate":"7.0","cover_x":2048,"title":"邪不压正","url":"https://movie.douban.com/subject/26366496/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2526297221.jpg","id":"26366496","cover_y":2867,"is_new":false},{"rate":"7.4","cover_x":1000,"title":"半个喜剧","url":"https://movie.douban.com/subject/30269016/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2576482356.jpg","id":"30269016","cover_y":1500,"is_new":false},{"rate":"6.8","cover_x":2000,"title":"超时空同居","url":"https://movie.douban.com/subject/27133303/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2520331478.jpg","id":"27133303","cover_y":2800,"is_new":false},{"rate":"8.2","cover_x":2000,"title":"罗小黑战记","url":"https://movie.douban.com/subject/26709258/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2568288336.jpg","id":"26709258","cover_y":3208,"is_new":false}]}
    
    Process finished with exit code 0

     三、requests爬取豆瓣电影

    import requests
    
    headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36",
             "Referer":"https://movie.douban.com/explore"}
    
    url="https://movie.douban.com/j/search_subjects?type=movie&tag=%E5%8D%8E%E8%AF%AD&sort=recommend"
    data={"page_limit":"20","page_start":"0"}
    response=requests.post(url,headers=headers,data=data)
    response.encoding="utf-8"
    print(response.status_code)
    print(response.text)

    运行结果如下:

    200
    {"subjects":[{"rate":"9.0","cover_x":2810,"title":"我不是药神","url":"https://movie.douban.com/subject/26752088/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2561305376.jpg","id":"26752088","cover_y":3937,"is_new":false},{"rate":"8.5","cover_x":5594,"title":"哪吒之魔童降世","url":"https://movie.douban.com/subject/26794435/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2563780504.jpg","id":"26794435","cover_y":8268,"is_new":false},{"rate":"7.9","cover_x":1786,"title":"流浪地球","url":"https://movie.douban.com/subject/26266893/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2545472803.jpg","id":"26266893","cover_y":2500,"is_new":false},{"rate":"4.7","cover_x":5906,"title":"诛仙 Ⅰ","url":"https://movie.douban.com/subject/25779217/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2567346094.jpg","id":"25779217","cover_y":8268,"is_new":false},{"rate":"8.3","cover_x":5906,"title":"少年的你","url":"https://movie.douban.com/subject/30166972/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2572166063.jpg","id":"30166972","cover_y":8268,"is_new":false},{"rate":"6.5","cover_x":679,"title":"西虹市首富","url":"https://movie.douban.com/subject/27605698/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2529206747.jpg","id":"27605698","cover_y":950,"is_new":false},{"rate":"7.8","cover_x":5906,"title":"我和我的祖国","url":"https://movie.douban.com/subject/32659890/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2567998580.jpg","id":"32659890","cover_y":8268,"is_new":false},{"rate":"7.1","cover_x":1080,"title":"一出好戏","url":"https://movie.douban.com/subject/26985127/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2529571873.jpg","id":"26985127","cover_y":1512,"is_new":false},{"rate":"6.9","cover_x":7142,"title":"飞驰人生","url":"https://movie.douban.com/subject/30163509/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2542973862.jpg","id":"30163509","cover_y":10000,"is_new":false},{"rate":"8.1","cover_x":1429,"title":"无名之辈","url":"https://movie.douban.com/subject/27110296/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2539661066.jpg","id":"27110296","cover_y":2000,"is_new":false},{"rate":"6.7","cover_x":1286,"title":"中国机长","url":"https://movie.douban.com/subject/30295905/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2568258113.jpg","id":"30295905","cover_y":1800,"is_new":false},{"rate":"8.1","cover_x":1000,"title":"无双","url":"https://movie.douban.com/subject/26425063/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2535260806.jpg","id":"26425063","cover_y":1400,"is_new":false},{"rate":"6.4","cover_x":960,"title":"疯狂的外星人","url":"https://movie.douban.com/subject/25986662/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2541901817.jpg","id":"25986662","cover_y":1359,"is_new":false},{"rate":"6.0","cover_x":1080,"title":"囧妈","url":"https://movie.douban.com/subject/30306570/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2581835383.jpg","id":"30306570","cover_y":1542,"is_new":false},{"rate":"7.9","cover_x":5315,"title":"白蛇:缘起","url":"https://movie.douban.com/subject/30331149/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2544313786.jpg","id":"30331149","cover_y":7441,"is_new":false},{"rate":"7.2","cover_x":2999,"title":"动物世界","url":"https://movie.douban.com/subject/26925317/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2525528688.jpg","id":"26925317","cover_y":4181,"is_new":false},{"rate":"7.0","cover_x":2048,"title":"邪不压正","url":"https://movie.douban.com/subject/26366496/","playable":true,"cover":"https://img3.doubanio.com/view/photo/s_ratio_poster/public/p2526297221.jpg","id":"26366496","cover_y":2867,"is_new":false},{"rate":"7.4","cover_x":1000,"title":"半个喜剧","url":"https://movie.douban.com/subject/30269016/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2576482356.jpg","id":"30269016","cover_y":1500,"is_new":false},{"rate":"6.8","cover_x":2000,"title":"超时空同居","url":"https://movie.douban.com/subject/27133303/","playable":true,"cover":"https://img1.doubanio.com/view/photo/s_ratio_poster/public/p2520331478.jpg","id":"27133303","cover_y":2800,"is_new":false},{"rate":"8.2","cover_x":2000,"title":"罗小黑战记","url":"https://movie.douban.com/subject/26709258/","playable":true,"cover":"https://img9.doubanio.com/view/photo/s_ratio_poster/public/p2568288336.jpg","id":"26709258","cover_y":3208,"is_new":false}]}
    
    Process finished with exit code 0
  • 相关阅读:
    Proj THUDBFuzz Paper Reading: The Art, Science, and Engineering of Fuzzing: A Survey
    Proj THUDBFuzz Paper Reading: A systematic review of fuzzing based on machine learning techniques
    9.3 付费代理的使用
    11.1 Charles 的使用
    第十一章 APP 的爬取
    10.2 Cookies 池的搭建
    10.1 模拟登录并爬取 GitHub
    11.5 Appium 爬取微信朋友圈
    11.4 Appium 的基本使用
    11.3 mitmdump 爬取 “得到” App 电子书信息
  • 原文地址:https://www.cnblogs.com/my-global/p/12441205.html
Copyright © 2020-2023  润新知