selenium模块

selenium模块介绍

知识点：
1. 导入webdriver
2. 创建webdriver对象
3. 设置无界面运行
4. 设置窗口大小
5. implicitly_wait
6. driver.find_element_by_xpath('')
7. element.find_element_by_xpath('')
8. element.get_attribute('title')
9. 翻页，最后执行click，防止StaleElementReferenceException异常出现

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

options = webdriver.ChromeOptions()
# 设置无界面运行
options.add_argument('headless')
# 创建一个Chrome浏览器对象,并将option绑定上
driver = webdriver.Chrome(chrome_options=options)
# 设置窗口的大小
driver.set_window_size(1920, 1080)
# 设置窗口为全屏模式
# driver.fullscreen_window()
# 获取窗口的x,y坐标及当前窗口的宽度和高度
# print(driver.get_window_rect())
# 给driver设置等待时间5s,如果页面加载完成则获取,未加载完成则等待
driver.implicitly_wait(5)
url = 'http://www.zongheng.com/rank/details.html?rt=6&d=1&p=1'
# 通过driver发送一个get请求 url
driver.get(url)
import time

while True:
    # 爬取某页中的所有书籍信息
    book_list = driver.find_elements_by_xpath(
        '//div[contains(@class,"rank_d_list")]')
    print(book_list)
    for book in book_list:
        name = book.get_attribute('bookname')
        href = book.find_element_by_xpath('.//div/a').get_attribute('href')
        author = book.find_element_by_xpath('.//div[2]/div[2]').get_attribute(
            'title')
        info = book.find_element_by_xpath('.//div[2]/div[3]').text
        print(name, href, author, info)
    try:
        # 找到下一页
        next_page = driver.find_element_by_xpath('//a[@title="下一页"]')
    except NoSuchElementException as e:
        print('爬取完毕！')
        break
    else:
        time.sleep(2)
        print('开始爬取第{page}页'.format(page=next_page.get_attribute('page')))
        #next_page.click()要放到最后，否则抛出"StaleElementReferenceException"
        next_page.click()

webdriver的一些其他方法

from selenium import webdriver
driver = webdriver.Chrome()

# 关闭当前的窗口
1. close()
# 关闭所有的窗口
2. quit()
# 执行JavaScript语句
3. execute_script（script，* args ）
# 在当前会话中设置cookie的值
4. add_cookie（cookie_dict ）
# 删除会话中的所有cookie
5. delete_all_cookies（）
# 通过类名找元素
6. find_element_by_class_name（名字）
# 通过css选择器找元素
7. find_element_by_css_selector（css_selector ）
# 通过id找元素
8. find_element_by_id（id_ ）
# 按照链接文本找元素
9. find_element_by_link_text(link_text )
# 按照名称查找元素
10. find_element_by_name（名字）
# 按照链接文本的部分来匹配查找元素
11. find_element_by_partial_link_text（link_text ）
# 按照标签名称来查找元素
12. find_element_by_tag_name（名字）
# 通过xpath来查找元素
13. find_element_by_xpath（xpath ）
# 获取窗口的x,y坐标及宽度和高度
14. get_window_rect（）
# 获取当前页面的url
15. current_url
# 返回当前窗口的句柄
16. current_window_handle
# 返回当前会话中的所有句柄
17. window_handles
# 模拟键入元素,必须是一个input框,可以往里面填值
18. send_keys()
# 点击
19. click()
# 提交
20. submit()

相关阅读:
More Effective C++ 条款31 让函数根据一个以上的对象类型来决定如何虚化
 定点数表示方法——原码,补码,反码,移码
 More Effective C++ 条款30 Proxy classes(替身类,代理类)
More Effective C++ 条款29 Reference counting(引用计数)
More Effective C++ 条款28 Smart Pointers(智能指针)
More Effective C++ 条款27 要求(禁止)对象产生与heap之中
 More Effective C++ 条款26 限制某个class所能产生的对象数量
 C/C++:对象/变量初始化相关
 More Effective C++ 条款25 将constructor和non-member function虚化
 origin作图，避免里面有Type 3 字体
原文地址：https://www.cnblogs.com/louyifei0824/p/9886455.html