• Scrapy:配置日志


    Scrapy logger 在每个spider实例中提供了一个可以访问和使用的实例,方法如下:

    import scrapy 
    
    class MySpider(scrapy.Spider):
            name = 'myspider'
            start_url = ['https://www.baidu.com']
            
            def  parse(self,response):
                    self.logger.info('Parse function called on %s',response.url)

    方法二:

    该记录器是使用spider的名称创建的,当然也可以应用到任意项目中

    import logging
    import scrapy 
    
    logger = logging.getLogger('mycustomlogger')
    #创建logger模块
    
    class MySpider(scrapy.Spider):
            name = 'myspider'
            start_url = ['https://www.baidu.com']
    
    #触发模块            
            def  parse(self,response):
            logger.info('Parse function called on %s',response.url)

    只需使用logging.getLogger函数获取其名称即可使用其记录器:

    import logging
    
    logger = logging.getLogger('mycustomlogger')
    logger.warning('This  is a warning')

    so anyway:我们也可以使用__name__变量填充当前模块的路径,确保正在处理的任何模块设置自定义记录器:

    import logging
    
    logger = logging.getLogger(__name__)
    logger.warning('This  is a warning')

    在scrapy项目的settings 文件中配置

    LOG_ENABLED = True #是否启动日志记录,默认True
    LOG_ENCODING = 'UTF-8'
    LOG_FILE = 'TEST1.LOG'#日志输出文件,如果为NONE,就打印到控制台
    LOG_LEVEL = 'INFO'#日志级别,默认debug
    LOG_FORMAT #日志格式
    LOG_DATEFORMAT#日志日期格式
    LOG_STDOUT #日志标准输出,默认False,如果True所有标准输出都将写入日志中,比如代码中的print输出也会被写入到文件
    LOG_SHORT_NAMES#短日志名,默认为false,如果为True将不输出组件名

     日志如下

    2019-04-26 15:48:33 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: patencent)
    2019-04-26 15:48:33 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b  26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0
    2019-04-26 15:48:33 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'patencent', 'LOG_ENCODING': 'UTF8', 'LOG_FILE': 'TEST1.LOG', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'patencent.spiders', 'SPIDER_MODULES': ['patencent.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
    2019-04-26 15:48:34 [scrapy.extensions.telnet] INFO: Telnet Password: 08683c5af998704b
    2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled extensions:
    ['scrapy.extensions.corestats.CoreStats',
     'scrapy.extensions.telnet.TelnetConsole',
     'scrapy.extensions.logstats.LogStats']
    2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled downloader middlewares:
    ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
     'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
     'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
     'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
     'scrapy.downloadermiddlewares.retry.RetryMiddleware',
     'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
     'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
     'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
     'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
     'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
     'scrapy.downloadermiddlewares.stats.DownloaderStats']
    2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled spider middlewares:
    ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
     'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
     'scrapy.spidermiddlewares.referer.RefererMiddleware',
     'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
     'scrapy.spidermiddlewares.depth.DepthMiddleware']
    2019-04-26 15:48:34 [scrapy.middleware] INFO: Enabled item pipelines:
    ['patencent.pipelines.PatencentPipeline']
    2019-04-26 15:48:34 [scrapy.core.engine] INFO: Spider opened
    2019-04-26 15:48:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
    2019-04-26 15:48:34 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
    2019-04-26 15:49:34 [scrapy.extensions.logstats] INFO: Crawled 1384 pages (at 1384 pages/min), scraped 1255 items (at 1255 items/min)
    2019-04-26 15:49:52 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: patencent)
    2019-04-26 15:49:52 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b  26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0
    2019-04-26 15:49:52 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'patencent', 'CONCURRENT_REQUESTS': 32, 'LOG_ENCODING': 'UTF8', 'LOG_FILE': 'TEST1.LOG', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'patencent.spiders', 'SPIDER_MODULES': ['patencent.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
    2019-04-26 15:49:52 [scrapy.extensions.telnet] INFO: Telnet Password: b2f1951d137cf133
    2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled extensions:
    ['scrapy.extensions.corestats.CoreStats',
     'scrapy.extensions.telnet.TelnetConsole',
     'scrapy.extensions.logstats.LogStats']
    2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled downloader middlewares:
    ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
     'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
     'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
     'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
     'scrapy.downloadermiddlewares.retry.RetryMiddleware',
     'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
     'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
     'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
     'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
     'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
     'scrapy.downloadermiddlewares.stats.DownloaderStats']
    2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled spider middlewares:
    ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
     'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
     'scrapy.spidermiddlewares.referer.RefererMiddleware',
     'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
     'scrapy.spidermiddlewares.depth.DepthMiddleware']
    2019-04-26 15:49:52 [scrapy.middleware] INFO: Enabled item pipelines:
    ['patencent.pipelines.PatencentPipeline']
    2019-04-26 15:49:52 [scrapy.core.engine] INFO: Spider opened
    2019-04-26 15:49:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
    2019-04-26 15:49:52 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
    2019-04-26 15:50:43 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: patencent)
    2019-04-26 15:50:43 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 19.2.0, Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b  26 Feb 2019), cryptography 2.6.1, Platform Windows-10-10.0.17134-SP0
    2019-04-26 15:50:43 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'patencent', 'CONCURRENT_REQUESTS': 32, 'LOG_ENCODING': 'UTF8', 'LOG_FILE': 'TEST1.LOG', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'patencent.spiders', 'SPIDER_MODULES': ['patencent.spiders'], 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
    2019-04-26 15:50:43 [scrapy.extensions.telnet] INFO: Telnet Password: 24d0317609676d2e
    2019-04-26 15:50:43 [scrapy.middleware] INFO: Enabled extensions:
    ['scrapy.extensions.corestats.CoreStats',
     'scrapy.extensions.telnet.TelnetConsole',
     'scrapy.extensions.logstats.LogStats']
    2019-04-26 15:50:44 [scrapy.middleware] INFO: Enabled downloader middlewares:
    ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
     'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
     'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
     'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
     'scrapy.downloadermiddlewares.retry.RetryMiddleware',
     'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
     'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
     'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
     'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
     'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
     'scrapy.downloadermiddlewares.stats.DownloaderStats']
    2019-04-26 15:50:44 [scrapy.middleware] INFO: Enabled spider middlewares:
    ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
     'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
     'scrapy.spidermiddlewares.referer.RefererMiddleware',
     'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
     'scrapy.spidermiddlewares.depth.DepthMiddleware']
    2019-04-26 15:50:44 [scrapy.middleware] INFO: Enabled item pipelines:
    ['patencent.pipelines.PatencentPipeline']
    2019-04-26 15:50:44 [scrapy.core.engine] INFO: Spider opened
    2019-04-26 15:50:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
    2019-04-26 15:50:44 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
    2019-04-26 15:51:44 [scrapy.extensions.logstats] INFO: Crawled 1364 pages (at 1364 pages/min), scraped 1238 items (at 1238 items/min)
    View Code
  • 相关阅读:
    Javascript获取本周,本月,本季,本年,上月,上周,上季,去年,上二周,上二月
    SQL SERVER 2008 评估期已过的解决办法
    习惯那些“小事”
    Oracle 测试语句
    整理js常用按键相关代码
    .NET 学习笔记
    lamda表达式学习
    使用Html.DropDownList
    ibatis
    MyBatis
  • 原文地址:https://www.cnblogs.com/jackzz/p/10774517.html
Copyright © 2020-2023  润新知