• python爬虫 selenium 抓取 今日头条(ajax异步加载)


    from selenium import webdriver
    from lxml import etree
    from pyquery import PyQuery as pq
    import time
    
    driver = webdriver.Chrome()
    driver.maximize_window()
    driver.get('https://www.toutiao.com/')
    driver.implicitly_wait(10)
    driver.find_element_by_link_text('科技').click()
    driver.implicitly_wait(10)
    for x in range(3):
        js="var q=document.documentElement.scrollTop="+str(x*500)
        driver.execute_script(js)
        time.sleep(2)
    
    time.sleep(5)
    page = driver.page_source
    doc = pq(page)
    doc = etree.HTML(str(doc))
    contents = doc.xpath('//div[@class="wcommonFeed"]/ul/li')
    print(contents)
    for x in contents:
        title = x.xpath('div/div[1]/div/div[1]/a/text()')
        if title:
            title = title[0]
            with open('toutiao.txt','a+',encoding='utf8')as f:
                f.write(title+'
    ')
            print(title)
        else:
            pass
  • 相关阅读:
    算法之递归
    初读 c# IL中间语言
    sql语句转为Model
    WPF-悬浮窗(类似于360)
    call,apply
    作用域题目
    css BFC
    数组扁平化 flatten
    常见的异步题
    setTimeout、Promise、Async/Await 的区别
  • 原文地址:https://www.cnblogs.com/hellangels333/p/8762112.html
Copyright © 2020-2023  润新知