• Python Pandas read_csv报错


    为实现文本去重(将前面采集的数据进行两两对比删除重复),写了以下代码。

    #-*- coding: utf-8 -*-
    import pandas as pd

    inputfile = 'e:/data/H_KJ300F-JAC2101W.txt' #评论文件
    outputfile = 'e:/data/H_KJ300F-JAC2101W_process_1.txt' #评论处理后保存路径
    data = pd.read_csv(inputfile, encoding = 'utf-8', header = None)
    l1 = len(data)
    data = pd.DataFrame(data[0].unique())
    l2 = len(data)
    data.to_csv(outputfile, index = False, header = False, encoding = 'utf-8')
    print(u'删除了%s条评论。' %(l1 - l2))

    报错:

    Traceback (most recent call last):  File "<stdin>", line 1, in <module>    return _read(filepath_or_buffer, kwds)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 401, in _read    data = parser.read()  File "D:Anaconda3libsite-packagespandasioparsers.py", line 939, in read    ret = self._engine.read(nrows)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 1508, in read    data = self._reader.read(nrows)  File "pandasparser.pyx", line 848, in pandas.parser.TextReader.read (pandasparser.c:10415)  File "pandasparser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandasparser.c:10691)  File "pandasparser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandasparser.c:11437)  File "pandasparser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:11308)  File "pandasparser.pyx", line 2024, in pandas.parser.raise_parser_error (pandasparser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 360, saw 2>>> data =pd.read_csv(inputfile,encoding ='utf-8',header = None)    data = self._reader.read(nrows)  File "pandasparser.pyx", line 848, in pandas.parser.TextReader.read (pandasparser.c:10415)>>>   File "pandasparser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:11308)  File "pandasparser.pyx", line 2024, in pandas.parser.raise_parser_error (pandasparser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 361, saw 2  File "pandasparser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandasparser.c:10691)  File "pandasparser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandasparser.c:11437)    ret = self._engine.read(nrows)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 1508, in read    data = parser.read()  File "D:Anaconda3libsite-packagespandasioparsers.py", line 939, in read    return _read(filepath_or_buffer, kwds)  File "D:Anaconda3libsite-packagespandasioparsers.py", line 401, in _read  File "D:Anaconda3libsite-packagespandasioparsers.py", line 646, in parser_fTraceback (most recent call last):  File "<stdin>", line 1, in <module>

    解决:把整个文件里面的半角","换成全角",“

    原因:没有设定分隔符的情况下,默认使用","作为分隔条符。

  • 相关阅读:
    制作IOS 后台极光推送时,遇到的小问题
    如何实现IOS_SearchBar搜索栏及关键字高亮
    使用WKWebView替换UIWebView,并且配置网页打电话功能
    [Creating an image format with an unknown type is an error] on cordova, ios 10
    面向对象语言还需要指针么?
    推荐一个简单好用的接口——字典序列化
    ITTC数据挖掘系统(六)批量任务,数据查看器和自由文档
    java的LINQ :Linq4j简明介绍
    别语言之争了,最牛逼的语言不是.NET,也不是JAVA!
    ITTC数据挖掘平台介绍(五) 数据导入导出向导和报告生成
  • 原文地址:https://www.cnblogs.com/a1397240667/p/6812807.html
Copyright © 2020-2023  润新知