• Using YQL as crawler for Javascript


    http://www.julianwong.net/blog/2009/06/using-yql-as-crawler-for-javascript/


    Using YQL as crawler for Javascript

    It is a good fun to play with Yahoo! Query Language (YQL). YQL is a service enables applications to query, filter, and combine data from different sources across the Internet. Many data in the Yahoo! network can be retrieved from YQL with a SQL like syntax.

    SELECT * FROM flickr.photos.search WHERE text="cat"

    Means to do a flickr search on photo with the text equals to cat. But the thing that catch me is the capability to convert the content (HTML page) from an external site to a well formatted XML / JSON.

    select * from html where url="http://news.yahoo.com/"<br />and xpath="/html/body/div[@id='doc4']/div[@id='bd']/div[@id='yui-main']/div/div[@id='top-story']/div/div[1]/div[2]/h2/a"

    The YQL above will return the headline from Yahoo! news. The xpath part looks pretty scary, but with the xpather firefox addon, you can get the xpath on any DOM element with right click -> Show in XPather. (P.S. One thing to notice with xpather is the tbody tag, which firefox will add to its DOM tree for table which might not really exist in the source HTML. This extra tbody will make YQL returns nothing as it never exists in the HTML code.)

    This is an excellent tool for the Javascript. Imagine that if you are going implement a RSS reader, without YQL, the RSS reader application must prepare all the data at the server side and send back to the client (like Fig.1). This is bad for performance as curl call are blocking calls while consuming YQL at client browser can be asynchronous and parallel. This sounds wise to offload those data crawling process to the client (like Fig.2).


    Fig. 1 The web application prepare all the data at the server side

    Fig. 2 Offloading the blocking curl calls to client side parallel YQL request.

    Leave a Reply





  • 相关阅读:
    golang基础--控制语句
    django restful framework 一对多方向更新数据库
    C# 一句很简单而又很经典的代码
    2D图形如何运动模拟出3D效果
    C# 通俗说 委托(和事件)
    C# 定积分求周长&面积原理 代码实现
    Unity 消息发送机制 解析
    Unreal 读书笔记 (二) 类对象的设计
    Unreal 读书笔记 (一) 五个常见基类
    Unreal TEXT FText,FName,FString 浅谈
  • 原文地址:https://www.cnblogs.com/lexus/p/2213821.html
Copyright © 2020-2023  润新知