• Scrapy 如何控制导出顺序


    Scrapy 如何控制导出顺序

    1. 遇到的问题

    在用Scrapy到处item的时候,发现顺序错乱(应该是按照abc的顺序排列的),并不是items.py文件中定义的顺序,那么如何控制呢?

    2. fields_to_export

    我在查看官网文档的时候找到了这个属性,它的解释是这样的:

    fields_to_export

    A list with the name of the fields that will be exported, or None if you want to export all fields. Defaults to None.

    Some exporters (like CsvItemExporter) respect the order of the fields defined in this attribute.

    When using item objects that do not expose all their possible fields, exporters that do not support exporting a different subset of fields per item will only export the fields found in the first item exported. Use fields_to_export to define all the fields to be exported.

    大致意思是:这个列表(它是一个列表)可以控制导出的字段个数,但是在一些导出器像CsvItemExporter可以控制导出字段的顺序

    所以:只需要在使用Exporter的时候,传一个fields_to_export的参数,就可以控制导出字段的个数/顺序

    3. 示例

    pipelines.py

    from scrapy.exporters import JsonLinesItemExporter, CsvItemExporter
    from itemadapter import ItemAdapter
    
    fields_to_export = ['city_name', 'house_addr', 'house_class', 'house_size', 'house_facility', 'house_price',
                        'house_release_time']
    
    
    class JsonLinesItemPipeline:
    
        def __init__(self):
            self.file = open('storages/renting.jl', 'wb')
            self.exporter = JsonLinesItemExporter(self.file, encoding='utf-8', fields_to_export=fields_to_export)
    
        def open_spider(self, spider):
            pass
    
        def process_item(self, item, spider):
            self.exporter.export_item(item)
            return item
    
        def close_spider(self, spider):
            self.file.close()
    
    
    class CsvItemPipeline:
    
        def __init__(self):
            self.file = open('storages/renting.csv', 'wb')
            self.exporter = CsvItemExporter(self.file, fields_to_export=fields_to_export)
    
        def open_spider(self, spider):
            pass
    
        def process_item(self, item, spider):
            self.exporter.export_item(item)
            return item
    
        def close_spider(self, spider):
            self.file.close()
    

    参考:

  • 相关阅读:
    TokenType ([{}{}()])[{}]{}
    C# netcore 开发WebService(SoapCore)
    C++求快速幂
    二分法与牛顿迭代法求方程根
    Obtaining a Thorough CS Background Online (线上CS深度学习攻略)
    Elasticsearch 堆空间配置
    S家lic
    如何用calibredrv 来merge多个cell的gds
    siliconsmart feature
    openwrt的IPTV配置
  • 原文地址:https://www.cnblogs.com/pineapple-py/p/14613390.html
Copyright © 2020-2023  润新知