安装
pip install chardet
pip 安装比较慢,可以参考 https://www.cnblogs.com/yunhgu/p/14749066.html 修改源就很很快了
如何使用
#! /usr/bin/env python
# -*- coding: utf-8 -*-#
# -------------------------------------------------------------------------------
# Name: demo
# Author: yunhgu
# Date: 2021/6/30 9:57
# Description:
# -------------------------------------------------------------------------------
import chardet
import time
from chardet.universaldetector import UniversalDetector
from functools import wraps
def timethis(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.process_time()
r = func(*args, **kwargs)
end = time.process_time()
print('{}.{} : {}'.format(func.__module__, func.__name__, end - start))
return r
return wrapper
@timethis
def get_encoding(path):
# 创建一个检测对象
detector = UniversalDetector()
with open(path, "rb") as f:
for line in f.readlines():
# 分块进行测试,直到达到阈值
detector.feed(line)
if detector.done:
break
# 关闭检测对象
detector.close()
# 输出检测结果
encoding = detector.result["encoding"]
print(f"encoding : {encoding}")
@timethis
def get_encoding2(path):
with open(path, "rb") as f:
encoding = chardet.detect(f.read())["encoding"]
print(f"encoding : {encoding}")
if __name__ == '__main__':
path = "42W-中文版(1).csv"
get_encoding(path)
get_encoding2(path)
上面的代码中函数2的用法是普通的用法,可以用于比较小的文件,而对于大文件的话用函数1的方法比较快,
下面是读取了含有42w条数据的文件的时间对比