• 爬虫小案例——爬取豆瓣电影


    分析

    https://movie.douban.com/j/new_search_subjects?sort=U&range=0,10&tags=%E7%94%B5%E5%BD%B1,%E9%9D%92%E6%98%A5&start=0&genres=%E5%89%A7%E6%83%85&countries=%E4%B8%AD%E5%9B%BD%E5%A4%A7%E9%99%86&year_range=2019,2019

    sort=U 近期热门

    S 评分最高

    range=0,10 评分区间筛选

    start=0 控制起始id

    year_range=2019,2019 年份

    tags 电影形式

    genres 电影类型

    country 国家

    print(str('电影'.encode('utf-8')).strip("b'").upper().replace('X', '%')) # %E7%94%B5%E5%BD%B1

    代码

    from requests_html import HTMLSession
    
    session = HTMLSession()
    
    # 测试
    url = 'https://movie.douban.com/tag/#/?sort=U&range=0,10&tags=2018'
    r = session.get(url=url)
    r.html.render()    # 调用render,启用浏览器内核,对浏览器进行渲染
    # print(r.html.html)
    
    movie_element_list = r.html.find('.list-wp a[class="item"]')
    # print(movie_element_list)
    for element in movie_element_list:
        movie_detail_url = element.attrs.get('href')                       # 电影详情页
        print(movie_detail_url)
        movie_img_url = element.find('img', first=True).attrs.get('src')   # 电影图片url
        print(movie_img_url)
        movie_name = element.find('[class="title"]', first=True).text      # 电影名字
        print(movie_name)
        print('-'*30)

    结果

  • 相关阅读:
    CF 461B Appleman and Tree
    POJ 1821 Fence
    NOIP 2012 开车旅行
    CF 494B Obsessive String
    BZOJ2337 XOR和路径
    CF 24D Broken robot
    POJ 1952 BUY LOW, BUY LOWER
    SPOJ NAPTIME Naptime
    POJ 3585
    CF 453B Little Pony and Harmony Chest
  • 原文地址:https://www.cnblogs.com/zhangguosheng1121/p/11341388.html
Copyright © 2020-2023  润新知