使用python转换编码格式

之前有写过一个使用powershell转换文档格式的方法，然而因为powershell支持不是很全，所以并不好用。这里使用python再做一个。

思路

检测源码格式，如果不是utf8，则进行转换，否则跳过

代码

import chardet
import sys
import codecs


def findEncoding(s):
    file = open(s, mode='rb')
    buf = file.read()
    result = chardet.detect(buf)
    file.close()
    return result['encoding']


def convertEncoding(s):
    encoding = findEncoding(s)
    if encoding != 'utf-8' and encoding != 'ascii':
        print("convert %s%s to utf-8" % (s, encoding))
        contents = ''
        with codecs.open(s, "r", encoding) as sourceFile:
            contents = sourceFile.read()

        with codecs.open(s, "w", "utf-8") as targetFile:
            targetFile.write(contents)

    else:
        print("%s encoding is %s ,there is no need to convert" % (s, encoding))


if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("error filename")
    else:
        convertEncoding(sys.argv[1])

实际测试，可以成功转换。

知识点

chardet，这个模块是用来检测编码格式的。检测完成之后返回一个dict类型。dict的key又两个，一个是encode，一个是confidence，参数函数顾名思义。
with as 这个语法很好用，特别是在打开文件的时候，可以处理忘记关闭文件导致文件一直被占用等异常。

批量转换

import chardet
import sys
import codecs
import os


def findEncoding(s):
    file = open(s, mode='rb')
    buf = file.read()
    result = chardet.detect(buf)
    file.close()
    return result['encoding']


def convertEncoding(s):
    if  os.access(s,os.W_OK):
        encoding = findEncoding(s)
        if encoding != 'utf-8' and encoding != 'ascii':
            print("convert %s%s to utf-8" % (s, encoding))
            contents = ''
            with codecs.open(s, "r", encoding) as sourceFile:
                contents = sourceFile.read()

            with codecs.open(s, "w", "utf-8") as targetFile:
                targetFile.write(contents)

        else:
            print("%s encoding is %s ,there is no need to convert" % (s, encoding))
    else:
        print("%s read only" %s)


def getAllFile(path, suffix='.'):
    "recursive is enable"
    f = os.walk(path)
    fpath = []

    for root, dir, fname in f:
        for name in fname:
            if name.endswith(suffix):
                fpath.append(os.path.join(root, name))

    return fpath


def convertAll(path):
    fclist = getAllFile(path, ".c")
    fhlist = getAllFile(path, ".h")
    flist = fclist + fhlist
    for fname in flist:
        convertEncoding(fname)


if __name__ == "__main__":
    path = ''
    if len(sys.argv) == 1:
        path = os.getcwd()

    elif len(sys.argv) == 2:
        path = sys.argv[1]
    else:
        print("error parameter")
        exit()

    convertAll(path)

可以指定目录，也可以在当前目录下用，递归遍历。

知识点

os.walk，遍历所有文件
os.access，检查文件属性

相关阅读:
教你解决Python爬虫的时候Xpath取值为空
 jmeter工具使用心得
 查找uipath项目中引用包的目录
 pandas 设置某列值的类型，求和指定列，给指定列赋值
 pandas 获取不符合条件/不包含某个字符串的dataframe
Visual Studio清理最近項目和解決方案
 【vue BUG记录】作用域插槽
 银行下拉框数据
 as3.0对图片进行不规则切割源代码实例
 Vuforia+single image 问题
原文地址：https://www.cnblogs.com/WeyneChen/p/6339962.html