• python 字符编码练习


    通过下面的练习,加深对python字符编码的认识

    # x00 - xff 256个字符
    >>> a = range(256)
    >>> b = bytes(a) # 不用参数encoding >>> b b'x00x01x02 ... xf6xf7xf8xf9xfaxfbxfcxfdxfexff' >>> b.decode('utf-8') # 报错 Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte >>> b.decode('unicode-escape') #正常 'x00x01x02 ... xf6÷xf8ùúxfbüxfdxfexff'
    # 题外:上面几句等价于下面一句
    >>> ''.join(list(map(chr, range(256))))
    'x00x01x02 ... xf6÷xf8ùúxfbüxfdxfexff'

    >>> a = 'abc' >>> a 'abc' >>> b = bytes(a, encoding='utf-8') # 方式一:把 'abc' 变为字节数据 >>> b b'abc' >>> c = a.encode('utf-8') # 方式二:把 'abc' 变为字节数据,与一等价 >>> c b'abc' # x00 - xff 256个字符,bytearray方式 >>> a = range(256) >>> b = bytearray(a) >>> b bytearray(b'x00x01x02 ... xf6xf7xf8xf9xfaxfbxfcxfdxfexff') >>> b.decode('unicode-escape') 'x00x01x02 ... xf6÷xf8ùúxfbüxfdxfexff' # 中文编码 >>> a = '' >>> a '中' >>> b = a.encode('gbk') >>> b b'xd6xd0' >>> c = a.encode('utf-8') >>> c b'xe4xb8xad' >>> d = a.encode('unicode-escape') >>> d b'\u4e2d' >>> e = a.encode('cp936') >>> e b'xd6xd0' # 中文解码 >>> a.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'str' object has no attribute 'decode' >>> b.decode() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 0: invalid continuation byte >>> b.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 0: invalid continuation byte >>> b.decode('gbk') '中' >>> b.decode('cp936') # gbk编码的可以cp936解码,反之不行。因为gbk是cp936的一个子集 '中'

    python官方支持的编码格式大全:https://docs.python.org/3/library/codecs.html#standard-encodings

  • 相关阅读:
    从客户端(Content="<p>666</p>")中检测到有潜在危险的 Request.Form 值。
    VS插件集
    Carmack在QUAKE3中使用的计算平方根的函数
    自动匹配HTTP请求中对应实体参数名的数据(性能不是最优)
    webapi单元测试时出现的ConfigurationManager.ConnectionStrings为空错误
    @@IDENTITY在加触发器时返回错误的ID值
    Protobuf完整实例
    Apache配置多个监听端口和不同的网站目录的简单方法[转]
    ThinkPHP 小技巧
    复选框 ajax取得后台页面
  • 原文地址:https://www.cnblogs.com/hhh5460/p/5571897.html
Copyright © 2020-2023  润新知