selenium是Python的第三方库,使用前需要安装。但是如果你使用的是anaconda,就可以省略这个步骤,为啥?自带,任性。
安装命令:
pip install selenium
(一)使用selenium打开指定网站,这里以淘宝为例。
# -*- coding: utf-8 -*-
"""
Created on Wed Jul 25 10:12:39 2018
@author: brave_man
email: 1979887709@qq.com
"""
from selenium import webdriver
from time import sleep
b = webdriver.Chrome()
b.get("http://www.taobao.com")
elem = b.find_element_by_id('q')
elem.send_keys('iphone')
sleep(3)
elem.clear()
elem.send_keys("ipad")
button = b.find_element_by_class_name("btn-search")
button.click()
sleep(5)
b.close()
(二)简单的拖拽动作(用于验证码识别)
# -*- coding: utf-8 -*- """ Created on Wed Jul 25 15:00:10 2018 @author: brave_man email: 1979887709@qq.com """ from selenium import webdriver from selenium.webdriver import ActionChains b = webdriver.Chrome() url = "http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable" b.get(url) b.switch_to.frame('iframeResult') sou = b.find_element_by_css_selector('#draggable') tar = b.find_element_by_css_selector('#droppable') actions = ActionChains(b) actions.drag_and_drop(sou, tar) actions.perform()
(三)在爬虫中,可能会由于网速等外界因素的影响,造成获取网页元素失败,这里介绍两种等待模式
1. 隐式等待:webdriver没有在DOM中找到想要的元素,在等待指定的时间后,会抛出一个找不到指定元素的异常。在网速特别慢的情况可以使用
from selenium import webdriver b = webdriver.Chrome() b.implicitly_wait(10) b.get("https://www.zhihu.com/explore") elem = b.find_element_by_class_name('zu-top-add-question') print(elem)
2. 显式等待
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC b = webdriver.Chrome() #b.implicitly_wait(10) b.get("https://taobao.com/") #elem = b.find_element_by_class_name('zu-top-add-question') b_wait = WebDriverWait(b, 10) elem = b_wait.until(EC.presence_of_all_elements_located((By.ID, 'q'))) button = b_wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.btn-search'))) print(elem, button)
(四)前进后退
from selenium import webdriver from time import sleep b = webdriver.Chrome() b.get("http://www.baidu.com") sleep(1) b.get("http://www.sina.com.cn") sleep(1) b.back() sleep(3) b.forward() sleep(3) b.close()
更多内容可以参考文档:http://selenium-python-zh.readthedocs.io/en/latest/index.html