• Scraping JavaScript webpages with webkit | WebScraping.com


    Scraping JavaScript webpages with webkit | WebScraping.com

    Scraping JavaScript webpages with webkit
    Posted 12 Mar 2010 in javascript, python, qt, and webkit

     

    In the previous post I covered how to tackle JavaScript based websites with Chickenfoot. Chickenfoot is great but not perfect because it:

    1. requires me to program in JavaScript rather than my beloved Python (with all its great libraries)
    2. is slow because have to wait for FireFox to render the entire webpage
    3. is somewhat buggy and has a small user/developer community, mostly at MIT

    An alternative solution that addresses all these points is webkit, the open source browser engine used most famously in Apple's Safari browser. Webkit has now been ported to the Qt framework and can be used through its Python bindings.

    Here is a simple class that renders a webpage (including executing any JavaScript) and then saves the final HTML to a file:

    import sys  
    from PyQt4.QtGui import *  
    from PyQt4.QtCore import *  
    from PyQt4.QtWebKit import *  
      
    class Render(QWebPage):  
      def __init__(self, url):  
        self.app = QApplication(sys.argv)  
        QWebPage.__init__(self)  
        self.loadFinished.connect(self._loadFinished)  
        self.mainFrame().load(QUrl(url))  
        self.app.exec_()  
      
      def _loadFinished(self, result):  
        self.frame = self.mainFrame()  
        self.app.quit()  
      
    url = 'http://webscraping.com'  
    r = Render(url)  
    html = r.frame.toHtml()  
    

    I can then analyze this resulting HTML with my standard Python tools like the webscraping module.

  • 相关阅读:
    win7如何配置access数据源
    pcA降维算法
    今天的分类
    实现MFC菜单画笔画圆,并且打钩
    多个字符串输出,竖直输出
    端口重用
    安卓快速关机APP
    端口转发
    学习OpenCV
    求解数独
  • 原文地址:https://www.cnblogs.com/lexus/p/3579770.html
Copyright © 2020-2023  润新知