PhantomJS1是一个可编写脚本的无头网页浏览器。它运行在Windows,macOS,Linux和FreeBSD上。
使用QtWebKit作为后端,它为各种Web标准提供快速和本机支持:DOM处理,CSS选择器,JSON,Canvas和SVG。
注意:多进程情况下PhantomJS性能会下降很严重。
到PhantomJS官网 http://phantomjs.org/download.html 下载相应环境的版本。
简单使用:
from selenium import webdriver
# from time import sleep
brower = webdriver.PhantomJS(executable_path='D:/selenium/phantomjs.exe')
brower.get("http://httpbin.org/ip")
print(brower.page_source)
# sleep(10)
brower.quit()
输出:
UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{
"origin": "194.156.230.140, 194.156.230.140"
}
</pre></body></html>
警告说:对PhantomJS的Selenium支持已被弃用,请使用无界面的Chrome或Firefox。
下面我们介绍使用pyvirtualdisplay
2:
以Ubuntu为例:
root@onefine-virtual-machine:/home/onefine# cat /proc/version
Linux version 4.18.0-13-generic (buildd@lgw01-amd64-048) (gcc version 8.2.0 (Ubuntu 8.2.0-7ubuntu1)) #14-Ubuntu SMP Wed Dec 5 09:04:24 UTC 2018
root@onefine-virtual-machine:/home/onefine#
安装pyvirtualdisplay
:
pip install pyvirtualdisplay
安装xvfb
:
sudo apt install xvfb
安装 chrome 浏览器: https://www.google.com/chrome/
安装 chromedriver: https://sites.google.com/a/chromium.org/chromedriver/downloads
例子:
显示浏览器的情况:
from selenium import webdriver
from pyvirtualdisplay import Display
from time import sleep
xephyr = Display(visible=1, size=(800, 600)).start()
url = "http://www.baidu.com"
browser = webdriver.Chrome(executable_path='./chromedriver')
browser.get(url)
sleep(5)
browser.quit()
xephyr.stop()
不显示浏览器的情况:
from selenium import webdriver
from pyvirtualdisplay import Display
from time import sleep
xephyr = Display(visible=0, size=(800, 600)).start()
url = "http://www.baidu.com"
browser = webdriver.Chrome(executable_path='./chromedriver')
browser.get(url)
print('browser.page_source', browser.page_source)
sleep(5)
browser.quit()
xephyr.stop()
运行情况:
关于scrapy-splash
详情: https://github.com/scrapy-plugins/scrapy-splash
关于selenium-grid
详情: https://docs.seleniumhq.org/docs/07_selenium_grid.jsp
关于splinter
详情: https://github.com/cobrateam/splinter
参考:
如何在使用 RemoteWebDriver 打开网页的同时获取 Http 状态码 https://www.cnblogs.com/lexfu/p/5288299.html
Selenium+PhantomJS使用时报错原因及解决方案 https://blog.csdn.net/u010358168/article/details/79749149
PyVirtualDisplay https://pyvirtualdisplay.readthedocs.io/en/latest/
selenium 不打开浏览器窗口模拟浏览器 http://www.leesven.com/2401.html