• 定义信息源的一些示例(xml文件)


    这里有一些共享的信息源,点击Download按钮下载即可。

    本文目录

    1、订阅博客,简单一例

    2、从网页获取信息,简单一例

    3、充分使用callback回调代码

    4、html_re中包含多个block

    5、使用html_json这个worker,解析json数据

    1、订阅博客,简单一例:

    <source>
        <name>范志红博客</name>
        <comment>搜狐博客。原创营养信息。</comment>
        <link>http://snowheart19.blog.sohu.com/</link>
    
        <worker>rss_atom</worker>
        <data>
            <url>http://snowheart19.blog.sohu.com/rss</url>
        </data>
    </source>

    2、从网页获取信息,简单一例:

    <source>
        <name>ybk168新邮预告</name>
        <comment>ybk168新邮预告</comment>
        <link>http://www.ybk168.com/newslist/00040051.html</link>
    
        <worker>html_re</worker>
        <data>
            <url>http://www.ybk168.com/newslist/00040051.html</url>
            <block>
                <blockre flags='DOTALL'>
    <![CDATA[
    <div class="list">(.*?)<div class="page">
    ]]>
                </blockre>
                <itemre flags='DOTALL'>
    <![CDATA[
    <li><span.*?href="([^"]+)".*?title="([^"]+)".*?
    class="list_lr">([^<]+)<
    ]]>
                </itemre>
                <maprules>
                    <title>2</title>
                    <url>'http://www.ybk168.com', 1</url>
                    <pub_date>3</pub_date>
                </maprules>
            </block>
        </data>
    </source>

    3、充分使用callback回调代码:

    <source>
        <name>北京空气质量</name>
        <comment>北京环境监测的微博。',利有散染预【8时' in s or '浓度】' not in s</comment>
        <link>http://weibo.cn/u/2516831703</link>
    
        <worker>html_re</worker>
        <data>
            <url>http://weibo.cn/u/2516831703</url>
            
            <block>
                <blockre flags='DOTALL'>
    <![CDATA[
    <div class="b">(.*)$
    ]]>
                </blockre>
                <itemre flags='DOTALL'>
    <![CDATA[
    weibo.cn[([d-]+)
    ]]>
                </itemre>
                <maprules>
                    <title>'notitle'</title>
                    <pub_date>1</pub_date>
                    <suid>1</suid>
                </maprules>
            </block>
            
            <block>
                <blockre flags='DOTALL'>
    <![CDATA[
    ^(?:.*?[<span class="kt">置顶</span>]|.*?<span class="pms">)
    (.*?)
    <input type="submit" value="查看更多内容"
    ]]>
                </blockre>
                <itemre flags='DOTALL'>
    <![CDATA[
    <div class="c" id="([^"]+)">
    (?:<div><span class="ctt">|.*?<span class="cmt">转发理由:</span>)
    (.*?)
    (?:</span>|<a [^>]+>赞[d+]).*?
    <span class="ct">([^& ]+)
    ]]>
                </itemre>
                <maprules>
                    <title>'notitle'</title>
                    <summary>2</summary>
                    <pub_date>3</pub_date>
                    <suid>1</suid>
                </maprules>
            </block>
        </data>
        
        <callback>
    <![CDATA[
    if posi == 0:
        temp_date = info.pub_date
        info.temp = 'del'
    elif '日' in info.pub_date:
        info.temp = 'del'
    else:
        s = info.summary
        if ',' in s or 
           '利' in s or 
           '有' in s or 
           '散' in s or 
           '染' in s or 
           '预' in s or 
           '【8时' in s or 
           '浓度】' not in s:
            info.url = 'http://weibo.cn/u/2516831703'
            info.pub_date = ''
            info.title = '[' + temp_date + '] ' + s[:16] + '…'
        else:
            info.temp = 'del'
    ]]>
        </callback>
    </source>

    4、html_re中包含多个block:

    <source>
        <name>中国国家地理</name>
        <comment>中国国家地理</comment>
        <link>http://www.dili360.com/</link>
    
        <worker>html_re</worker>
        <data>
            <url>http://www.dili360.com/</url>
            
            <block>
                <blockre flags='DOTALL'>
    <![CDATA[
    <div class="community-item" id="community-items" >
    (.*?)<!--end-->
    ]]>
                </blockre>
                <itemre flags='DOTALL'>
    <![CDATA[
    <li class="img-block".*?
    <a target="_blank" href="([^"]+)">.*?
    <h4>(.*?)</h4>
    ]]>
                </itemre>
                <maprules>
                    <title>2</title>
                    <url>'http://www.dili360.com', 1</url>
                </maprules>
            </block>
            
            <block>
                <blockre flags='DOTALL'>
    <![CDATA[
    <div class="community-item" id="community-items" >
    (.*?)<!--end-->
    ]]>
                </blockre>
                <itemre flags='DOTALL'>
    <![CDATA[
    <dt><a href="([^"]+)" target="_blank">(.*?)</a></dt>
    ]]>
                </itemre>
                <maprules>
                    <title>2</title>
                    <url>'http://www.dili360.com', 1</url>
                </maprules>
            </block>
            
            <block>
                <blockre flags='DOTALL'>
    <![CDATA[
    <ul class="style-1" id="replace">(.*?)</ul>
    ]]>
                </blockre>
                <itemre flags='DOTALL'>
    <![CDATA[
    <div class="detail">.*?
    <a href="([^"]+)" target="_blank"><h4>(.*?)</h4>
    ]]>
                </itemre>
                <maprules>
                    <title>2</title>
                    <url>'http://www.dili360.com', 1</url>
                    <summary>'景观图片'</summary>
                </maprules>
            </block>
            
        </data>
    </source>

    5、使用html_json这个worker,解析json数据:

    <source>
        <name>新浪书讯</name>
        <comment>新浪图书,书讯。</comment>
        <link>http://book.sina.com.cn/</link>
    
        <worker>html_json</worker>
        <data>
            <url>http://feed.mix.sina.com.cn/api/roll/get?callback=jsonp1436772833418&amp;pageid=8&amp;lid=156&amp;num=20</url>
            <re  flags='DOTALL'>
    <![CDATA[
    ^try{w+(
    (.*)
    );}catch(e){};$
    ]]>
            </re>
        
            <block>
                <block_path>'result', 'data'</block_path>
                <title>'title'</title>
                <url>'url'</url>
                <summary>'summary'</summary>
                <temp>'intro'</temp>
                <pub_date>'ctime'</pub_date>
            </block>
        </data>
        
        <callback>
    <![CDATA[
    info.pub_date = unixtime(info.pub_date)
    info.summary = info.summary or info.temp
    info.temp = 0
    ]]>
        </callback>
    </source>
  • 相关阅读:
    UVA
    [CQOI2018] 社交网络
    UVA
    51nod 1314 定位系统
    51nod 1211 数独
    51nod 1392 装盒子
    51nod1253 Kundu and Tree
    51nod1313 完美串
    51nod1039 x^3 mod p
    51nod1369 无穷印章
  • 原文地址:https://www.cnblogs.com/infopi/p/4871223.html
Copyright © 2020-2023  润新知