scrapy crawl rules设置

 rules = [  
        Rule(SgmlLinkExtractor(allow=('/u012150179/article/details'),  
                              restrict_xpaths=('//li[@class="next_article"]')),  
             callback='parse_item',  
             follow=True)  
    ]  
  
    def parse_item(self, response):  
  
        #print "parse_item>>>>>>"  
        item = CsdnblogcrawlspiderItem()   
        blog_url = str(response.url)  
        blog_name = response.xpath('//div[@id="article_details"]/div/h1/span/a/text()').extract()  
  
        item['blog_name'] = [n.encode('utf-8') for n in blog_name]  
        item['blog_url'] = blog_url.encode('utf-8')  
  
        return item

相关阅读:
第3章 Python的数据结构、函数和文件
字符与编码
第2章 IPython和Jupyter
第1章准备工作
(转)详解Python的装饰器
(转)Python中的split()函数
5.5 用户定义的可调用类型
2.6 序列的增量赋值
Zookeeper简析
Dubbo-服务引入源码分析

原文地址：https://www.cnblogs.com/Erick-L/p/6836448.html