一.判断异步加载方式(常用的JS库)
1. jQuery(70%)
# 搜索 jquery 茅塞顿开
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script src="/Scripts/jquery-1.11.2.min.js"></script>
2.Google Analytics(50%)
# 搜索 Google Analytics
<!-- Google Analytics -->
<script type="text/javascript">
二.解决
- 安装pip Selenium
- 下载PhantomJS http://phantomjs.org/download.html
1.Ajax Asynchronous JavaScript and XML(异步 JavaScript 和 XML)
使用Ajax向服务器发送表单(如,延迟加载,下拉刷新,底部刷新...)
2.动态HTML(dynamic HTML, DHTML)
一系列用于解决网络问题的技术集合(如,鼠标指向显示,下拉菜单实现)
代码实现
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
# 指明phantomjs的执行路径
driver = webdriver.PhantomJS(executable_path=r'E:softwarephantomjs-2.1.1-windowsinphantomjs.exe')
driver.get("http://pythonscraping.com/pages/javascript/ajaxDemo.html")
# 方法1:显式给3秒加载时间
time.sleep(3)
# 方法2:让 Selenium 不断地检查某个元素是否存在,以此确定页面是否已经完全加载(需要导入库)
try:
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, "loadedButton")))
finally:
print(driver.page_source)
driver.close()
# 获取内容
# print(driver.page_source)
#
# driver.close()