• 使用python转换编码格式


    之前有写过一个使用powershell转换文档格式的方法,然而因为powershell支持不是很全,所以并不好用。这里使用python再做一个。

    思路

    检测源码格式,如果不是utf8,则进行转换,否则跳过

    代码

    import chardet
    import sys
    import codecs
    
    
    def findEncoding(s):
        file = open(s, mode='rb')
        buf = file.read()
        result = chardet.detect(buf)
        file.close()
        return result['encoding']
    
    
    def convertEncoding(s):
        encoding = findEncoding(s)
        if encoding != 'utf-8' and encoding != 'ascii':
            print("convert %s%s to utf-8" % (s, encoding))
            contents = ''
            with codecs.open(s, "r", encoding) as sourceFile:
                contents = sourceFile.read()
    
            with codecs.open(s, "w", "utf-8") as targetFile:
                targetFile.write(contents)
    
        else:
            print("%s encoding is %s ,there is no need to convert" % (s, encoding))
    
    
    if __name__ == "__main__":
        if len(sys.argv) != 2:
            print("error filename")
        else:
            convertEncoding(sys.argv[1])
    

    实际测试,可以成功转换。

    知识点

    1. chardet,这个模块是用来检测编码格式的。检测完成之后返回一个dict类型。dict的key又两个,一个是encode,一个是confidence,参数函数顾名思义。
    2. with as 这个语法很好用,特别是在打开文件的时候,可以处理忘记关闭文件导致文件一直被占用等异常。

    批量转换

    import chardet
    import sys
    import codecs
    import os
    
    
    def findEncoding(s):
        file = open(s, mode='rb')
        buf = file.read()
        result = chardet.detect(buf)
        file.close()
        return result['encoding']
    
    
    def convertEncoding(s):
        if  os.access(s,os.W_OK):
            encoding = findEncoding(s)
            if encoding != 'utf-8' and encoding != 'ascii':
                print("convert %s%s to utf-8" % (s, encoding))
                contents = ''
                with codecs.open(s, "r", encoding) as sourceFile:
                    contents = sourceFile.read()
    
                with codecs.open(s, "w", "utf-8") as targetFile:
                    targetFile.write(contents)
    
            else:
                print("%s encoding is %s ,there is no need to convert" % (s, encoding))
        else:
            print("%s read only" %s)
    
    
    def getAllFile(path, suffix='.'):
        "recursive is enable"
        f = os.walk(path)
        fpath = []
    
        for root, dir, fname in f:
            for name in fname:
                if name.endswith(suffix):
                    fpath.append(os.path.join(root, name))
    
        return fpath
    
    
    def convertAll(path):
        fclist = getAllFile(path, ".c")
        fhlist = getAllFile(path, ".h")
        flist = fclist + fhlist
        for fname in flist:
            convertEncoding(fname)
    
    
    if __name__ == "__main__":
        path = ''
        if len(sys.argv) == 1:
            path = os.getcwd()
    
        elif len(sys.argv) == 2:
            path = sys.argv[1]
        else:
            print("error parameter")
            exit()
    
        convertAll(path)
    

    可以指定目录,也可以在当前目录下用,递归遍历。

    知识点

    1. os.walk,遍历所有文件
    2. os.access,检查文件属性
  • 相关阅读:
    教你解决Python爬虫的时候Xpath取值为空
    jmeter工具使用心得
    查找uipath项目中引用包的目录
    pandas 设置某列值的类型,求和指定列,给指定列赋值
    pandas 获取不符合条件/不包含某个字符串的dataframe
    Visual Studio清理最近項目和解決方案
    【vue BUG记录】作用域插槽
    银行下拉框数据
    as3.0对图片进行不规则切割源代码实例
    Vuforia+single image 问题
  • 原文地址:https://www.cnblogs.com/WeyneChen/p/6339962.html
Copyright © 2020-2023  润新知