• scrapy中的request


    scrapy中的request
    初始化参数
    class scrapy.http.Request(
    url [ ,
    callback,
    method='GET',
    headers,
    body,
    cookies,
    meta,
    encoding='utf-8',
    priority=0,
     don't_filter=False,
     errback ] )
    
    
    1,生成Request的方法
    def parse_page1(self, response):
        return scrapy.Request("http://www.example.com/some_page.html",
                              callback=self.parse_page2)
    
    def parse_page2(self, response):
        # this would log http://www.example.com/some_page.html
        self.logger.info("Visited %s", response.url)
    
    2,通过Request传递数据的方法
    def parse_page1(self, response):
        item = MyItem()
        item['main_url'] = response.url
        request = scrapy.Request("http://www.example.com/some_page.html",
                                 callback=self.parse_page2)
        request.meta['item'] = item
        yield request
    
    def parse_page2(self, response):
        item = response.meta['item']
        item['other_url'] = response.url
        yield item
    
    3,Request.meta中的特殊关键字
    
    
    4,主要子类FormRequest,用于登陆
    return [FormRequest(url="http://www.example.com/post/action",
                        formdata={'name': 'John Doe', 'age': '27'},
                        callback=self.after_post)]
    
    更相信的登陆的例子
    import scrapy
    
    class LoginSpider(scrapy.Spider):
        name = 'example.com'
        start_urls = ['http://www.example.com/users/login.php']
    
        def parse(self, response):
            return scrapy.FormRequest.from_response(
                response,
                formdata={'username': 'john', 'password': 'secret'},
                callback=self.after_login
            )
    
        def after_login(self, response):
            # check login succeed before going on
            if "authentication failed" in response.body:
                self.logger.error("Login failed")
                return
    
            # continue scraping with authenticated session...
  • 相关阅读:
    Scrapy框架实现持久化存储
    Scrapy框架的介绍和基本使用
    处理页面动态加载数据
    爬虫数据解析
    Python爬虫基础
    Flask详解(下篇)
    Flask详解(中篇)
    CentOS 中的性能监测命令vmstat
    CentOS 7安装MySQL 8.0.15
    CF B.Kind Anton(4月8号)
  • 原文地址:https://www.cnblogs.com/themost/p/7106250.html
Copyright © 2020-2023  润新知