Scraping JavaScript webpages with webkit | WebScraping.com

Scraping JavaScript webpages with webkit
Posted 12 Mar 2010 in javascript, python, qt, and webkit

In the previous post I covered how to tackle JavaScript based websites with Chickenfoot. Chickenfoot is great but not perfect because it:
requires me to program in JavaScript rather than my beloved Python (with all its great libraries)
is slow because have to wait for FireFox to render the entire webpage
is somewhat buggy and has a small user/developer community, mostly at MIT
An alternative solution that addresses all these points is webkit, the open source browser engine used most famously in Apple's Safari browser. Webkit has now been ported to the Qt framework and can be used through its Python bindings.
Here is a simple class that renders a webpage (including executing any JavaScript) and then saves the final HTML to a file:
import sys  
from PyQt4.QtGui import *  
from PyQt4.QtCore import *  
from PyQt4.QtWebKit import *  
  
class Render(QWebPage):  
  def __init__(self, url):  
    self.app = QApplication(sys.argv)  
    QWebPage.__init__(self)  
    self.loadFinished.connect(self._loadFinished)  
    self.mainFrame().load(QUrl(url))  
    self.app.exec_()  
  
  def _loadFinished(self, result):  
    self.frame = self.mainFrame()  
    self.app.quit()  
  
url = 'http://webscraping.com'  
r = Render(url)  
html = r.frame.toHtml()  

相关阅读:
updatepanel中不能使用fileupload的弥补方法
AJAXPro用法，关于JS同步和异步调用后台代码的学习
How do I get data from a data table in javascript?
记不住ASP.NET页面生命周期的苦恼
浅谈ASP.NET中render方法
解决AjaxPro2中core.ashx 407缺少对象的问题
ServU 6.0出来了
关于X Server/Client和RDP的畅想
这个Blog好像没有分页功能嘛
AOC的显示器太烂了

原文地址：https://www.cnblogs.com/lexus/p/3579917.html