• python decode unicode encode


           字符串在Python内部的表示是unicode编码,因此,在做编码转换时,通常需要以unicode作为中间编码,即先将其他编码的字符串解码(decode)成unicode,再从unicode编码(encode)成另一种编码。

           代码中字符串的默认编码与代码文件本身的编码一致,以下是不一致的两种:

            1. s = u'你好'

                该字符串的编码就被指定为unicode了,即python的内部编码,而与代码文件本身的编码(查看默认编码:import sys   print('hello',sys.getdefaultencoding())  ascii 。设置默认编码:import sys reload(sys)  sys.setdefaultencoding('utf-8')))无关。因此,对于这种情况做编码转换,只需要直接使用encode方法将其转换成指定编码即可.

            2. # -*- coding: utf-8 -*-

                s = ‘你好’

                此时为utf-8编码,ascii编码不能显示汉字

    isinstance(s, unicode)  #用来判断是否为unicode ,是返回True,不是返回False

    unicode(str,'gb2312')与str.decode('gb2312')是一样的,都是将gb2312编码的str转为unicode编码 

    使用str.__class__可以查看str的编码形式

    原理说了半天,最后来个包治百病的吧:)


    #!/usr/bin/env python
    #coding=utf-8
    s="中文"

    if isinstance(s, unicode):
    #s=u"中文"
    print s.encode('gb2312')
    else:
    #s="中文"
    print s.decode('utf-8').encode('gb2312')

    语音模块代码:

    # -*- coding: utf-8 -*-import
    import sys
    print('hello',sys.getdefaultencoding())
    def xfs_frame_info(words):
    
        #decode utf-8 to python internal unicode coding
        isinstance(words,unicode)
        wordu = words.decode('utf-8')
    
        #encode python unicode to gbk
        data = wordu.encode('gbk')
        
        length = len(data) + 2
    
        frame_info = bytearray(5)
        frame_info[0] = 0xfd
        frame_info[1] = (length >> 8)
        frame_info[2] = (length & 0x00ff)
        frame_info[3] = 0x01
        frame_info[4] = 0x01
    
           
        buf = frame_info + data
        print("buf:",buf)
    
        return buf
    
    if __name__ == "__main__":
    
        print("hello world")
        words1= u'你好'
        #encodetype = isinstance(words1,unicode)
        #print("encodetype",encodetype)
        print("origin unicode", words1)
        
        words= words1.encode('utf-8')
        print("utf-8 encoded", words)
        a = xfs_frame_info(words)
        print('a',a)
    
    if __name__ == "__main__":
    
        print("hello world")
        words1= '你好'
        print("oringe utf-8 encode:",words1)
        encodetype = isinstance(words1,unicode)
        wordu = words1.decode('utf-8')
        print("unicode from utf-8 decode:",wordu)
        #encodetype = isinstance(words1,utf-8)
        #encodetype = isinstance(words1,'ascii')
        #print("encodetype",encodetype)
        #print("origin unicode", words1)
        
        word_utf8 = wordu.encode('utf-8')
        #encodetype2 = isinstance(words,utf8)
        #print("encodetype2",encodetype2)
        print("utf-8 encoded",word_utf8)
        a = xfs_frame_info(word_utf8)
        print('a',a)

    你好前不加u''时,要多一步decode为unicode

  • 相关阅读:
    软件开发流程
    计算机与生命体的类比
    cnBeta过期评论查看器,再次更新
    用Ruby写的离线浏览代理服务器,重要更新
    计算机编程常用词汇
    网站创意:商品知识库
    Node.JS进行简单新技术分析及环境搭建
    MongoDB (0)写在前面
    基于CXF Java 搭建Web Service (Restful Web Service与基于SOAP的Web Service混合方案)
    MongoDB (5)不仅仅是数据库
  • 原文地址:https://www.cnblogs.com/cj2014/p/4236114.html
Copyright © 2020-2023  润新知