• scrapy之定制命令


    单爬虫运行

    import sys
    from scrapy.cmdline import execute
    
    if __name__ == '__main__':
    	execute(["scrapy","crawl","chouti","--nolog"])
    

    然后右键运行py文件即可运行名为‘chouti‘的爬虫

    同时运行多个爬虫

    步骤如下:

    - 在spiders同级创建任意目录,如:commands
    - 在其中创建 crawlall.py 文件 (此处文件名就是自定义的命令)
    - 在settings.py 中添加配置 COMMANDS_MODULE = '项目名称.目录名称'
    - 在项目目录执行命令:scrapy crawlall

    代码如下:

    from scrapy.commands import ScrapyCommand
        from scrapy.utils.project import get_project_settings
    
        class Command(ScrapyCommand):
    
            requires_project = True
    
            def syntax(self):
                return '[options]'
    
            def short_desc(self):
                return 'Runs all of the spiders'
    
            def run(self, args, opts):
                spider_list = self.crawler_process.spiders.list()
                for name in spider_list:
                    self.crawler_process.crawl(name, **opts.__dict__)
                self.crawler_process.start()
    
    crawlall.py
    

      

     
     
     
  • 相关阅读:
    Python 更新pip报错:Could not install packages due to an EnvironmentError: [WinError 5] 拒绝访问
    Vs code 配置(1)-win10
    博客园主题--sukura
    log4j
    安装ant问题
    freemarker string= null
    学习随想
    j2ee学习资料收集
    eclipse + marven
    好文mark
  • 原文地址:https://www.cnblogs.com/jiangchunsheng/p/9260221.html
Copyright © 2020-2023  润新知