最近一直在用phantomjs 自动登陆并爬取一些数据,突然发现爬取https类型的网站的时候无法正常操作了
困扰了两天的问题在经过google和stackoverflow的一番搜索后发现原来Phantomjs中有个service_args参数可以忽略https错误
在Linux Centos服务器上本来想用Xvfb+Firefox和chrome解决,但是配置了好几个版本的都无法正常运行
# coding=utf-8
import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
ua = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.3 Safari/537.36"
cap = webdriver.DesiredCapabilities.PHANTOMJS
cap["phantomjs.page.settings.resourceTimeout"] = 200000
cap["phantomjs.page.settings.loadImages"] = True
cap["phantomjs.page.settings.disk-cache"] = True
cap["phantomjs.page.settings.userAgent"] = ua
cap["phantomjs.page.customHeaders.User-Agent"] =ua
cap["phantomjs.page.customHeaders.Referer"] = "http://tj.ac.10086.cn/login/"
driver = webdriver.PhantomJS(desired_capabilities=cap, service_args=['--ignore-ssl-errors=true'])
到此发现问题解决