• Python--csv文件处理


    CSV(Comma-Separator Values)逗号分割值,由于是纯文本文件,任何编辑器都可以打开。下面用csv和pandas两种方式进行csv文件操作  

    原始csv文件内容

    Supplier Name,Invoice Number,Part Number,Cost,Purchase Date
    Supplier X,001-1001,2341,$500.00 ,1/20/14
    Supplier X,001-1001,2341,$500.00 ,1/20/14
    Supplier X,001-1001,5467,$750.00 ,1/20/14
    Supplier X,001-1001,5467,$750.00 ,1/20/14
    Supplier Y,50-9501,7009,$250.00 ,1/30/14
    Supplier Y,50-9501,7009,$250.00 ,1/30/14
    Supplier Y,50-9505,6650,$125.00 ,2002/3/14
    Supplier Y,50-9505,6650,$125.00 ,2002/3/14
    Supplier Z,920-4803,3321,$615.00 ,2002/3/14
    Supplier Z,920-4804,3321,$615.00 ,2002/10/14
    Supplier Z,920-4805,3321,"$6,015.00 ",2/17/14
    Supplier Z,920-4806,3321,"$1,006,015.00 ",2/24/14

    1. csv包操作csv文件

    #coding=utf-8
    
    import sys
    import csv
    import re
    
    read_file = sys.argv[1]
    write_file = sys.argv[2]
    
    with open(read_file, "r") as readfile:
        with open(write_file, "w") as writefile:
            reader = csv.reader(readfile, delimiter=",")
            writer = csv.writer(writefile, delimiter=",")
            header = next(reader)
            writer.writerow(header)
            for rowlist in reader:
                #通过正则表达是进行行匹配
                if re.match(r"^001-*.", str(rowlist[1])):
                    print (rowlist)
                    writer.writerow(rowlist)

    >>> D:Pystu>python parsecsvfile.py supplier_data.csv ceshi.csv
    >>> Supplier Name,Invoice Number,Part Number,Cost,Purchase Date
    >>> Supplier X,001-1001,2341,$500.00 ,1/20/14
    >>> Supplier X,001-1001,2341,$500.00 ,1/20/14
    >>> Supplier X,001-1001,5467,$750.00 ,1/20/14
    >>> Supplier X,001-1001,5467,$750.00 ,1/20/14

    2. pandas包操作csv文件

    #coding=utf-8
    
    ''' 运用pandas包解析csv文件'''
    import pandas
    from pandas import Series,DataFrame
    import sys
    
    file_path = sys.argv[1]
    write_path = sys.argv[2]
    data_frame = pandas.read_csv(file_path)
    #print (data_frame)
    
    #注意str的使用
    data_frame["Cost"] = data_frame["Cost"].str.replace(",", "").str.strip("$").astype(float)
    #print (data_frame)
    
    newa = data_frame.loc[data_frame["Cost"] > 600, :]
    #print (newa)
    newa.to_csv(write_path, index = False)

    >>> D:Pystu>python parse_csv_file_by_pandas.py supplier_data.csv ceshi.csv
    >>> Supplier Name,Invoice Number,Part Number,Cost,Purchase Date
    >>> Supplier X,001-1001,5467,750.0,1/20/14
    >>> Supplier X,001-1001,5467,750.0,1/20/14
    >>> Supplier Z,920-4803,3321,615.0,2002/3/14
    >>> Supplier Z,920-4804,3321,615.0,2002/10/14
    >>> Supplier Z,920-4805,3321,6015.0,2/17/14
    >>> Supplier Z,920-4806,3321,1006015.0,2/24/14

  • 相关阅读:
    mysql读写分离
    mysql主从同步
    扫描与抓包
    加密与入侵检查
    监控
    selinux
    预期交互
    python发送邮件
    linux下安装虚拟环境
    博弈论
  • 原文地址:https://www.cnblogs.com/xiaobingqianrui/p/8421454.html
Copyright © 2020-2023  润新知