Python Pandas read_csv报错

Python Pandas read_csv报错

为实现文本去重（将前面采集的数据进行两两对比删除重复），写了以下代码。

#-*- coding: utf-8 -*-
import pandas as pd

inputfile = 'e:/data/H_KJ300F-JAC2101W.txt' #评论文件
outputfile = 'e:/data/H_KJ300F-JAC2101W_process_1.txt' #评论处理后保存路径
data = pd.read_csv(inputfile, encoding = 'utf-8', header = None)
l1 = len(data)
data = pd.DataFrame(data[0].unique())
l2 = len(data)
data.to_csv(outputfile, index = False, header = False, encoding = 'utf-8')
print(u'删除了%s条评论。' %(l1 - l2))

报错：

Traceback (most recent call last): File "<stdin>", line 1, in <module> return _read(filepath_or_buffer, kwds) File "D:Anaconda3libsite-packagespandasioparsers.py", line 401, in _read data = parser.read() File "D:Anaconda3libsite-packagespandasioparsers.py", line 939, in read ret = self._engine.read(nrows) File "D:Anaconda3libsite-packagespandasioparsers.py", line 1508, in read data = self._reader.read(nrows) File "pandasparser.pyx", line 848, in pandas.parser.TextReader.read (pandasparser.c:10415) File "pandasparser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandasparser.c:10691) File "pandasparser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandasparser.c:11437) File "pandasparser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:11308) File "pandasparser.pyx", line 2024, in pandas.parser.raise_parser_error (pandasparser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 360, saw 2>>> data =pd.read_csv(inputfile,encoding ='utf-8',header = None) data = self._reader.read(nrows) File "pandasparser.pyx", line 848, in pandas.parser.TextReader.read (pandasparser.c:10415)>>> File "pandasparser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandasparser.c:11308) File "pandasparser.pyx", line 2024, in pandas.parser.raise_parser_error (pandasparser.c:27037)pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 361, saw 2 File "pandasparser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandasparser.c:10691) File "pandasparser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandasparser.c:11437) ret = self._engine.read(nrows) File "D:Anaconda3libsite-packagespandasioparsers.py", line 1508, in read data = parser.read() File "D:Anaconda3libsite-packagespandasioparsers.py", line 939, in read return _read(filepath_or_buffer, kwds) File "D:Anaconda3libsite-packagespandasioparsers.py", line 401, in _read File "D:Anaconda3libsite-packagespandasioparsers.py", line 646, in parser_fTraceback (most recent call last): File "<stdin>", line 1, in <module>

解决：把整个文件里面的半角","换成全角"，“

原因：没有设定分隔符的情况下，默认使用","作为分隔条符。
相关阅读:
制作IOS 后台极光推送时，遇到的小问题
 如何实现IOS_SearchBar搜索栏及关键字高亮
 使用WKWebView替换UIWebView，并且配置网页打电话功能
 [Creating an image format with an unknown type is an error] on cordova, ios 10
面向对象语言还需要指针么？
推荐一个简单好用的接口——字典序列化
 ITTC数据挖掘系统（六）批量任务，数据查看器和自由文档
 java的LINQ :Linq4j简明介绍
 别语言之争了，最牛逼的语言不是.NET，也不是JAVA!
ITTC数据挖掘平台介绍（五）数据导入导出向导和报告生成
原文地址：https://www.cnblogs.com/a1397240667/p/6812807.html