Item 3: Know the Differences Between bytes and str(了解字节和str之间的区别)

Item 3: Know the Differences Between bytes and str(了解字节和str之间的区别)
在Python中，有两种类型表示字符数据序列:bytes和str. 字节的实例包含原始数据，无符号的8位值(通常以ASCII编码显示):
```
>>> a = b'hx65llo'
>>> a
b'hello'
>>> print(list(a))
[104, 101, 108, 108, 111]
>>> print(a)
b'hello'
>>> 
```
str的实例包含表示人类语言文本字符的Unicode编码。
```
>>> a = 'au0300 propos'
>>> print(list(a))
['a', '̀', ' ', 'p', 'r', 'o', 'p', 'o', 's']
>>> print(a)
à propos
```
重要的是，str实例没有相关联的二进制编码，而bytes实例也没有相关联的文本编码.
- 要将Unicode数据转换为二进制数据，必须调用str的encode方法。
- 要将二进制数据转换为Unicode，必须调用bytes的 decode 方法
您可以显式地指定要为这些方法使用的编码，或者接受系统默认值，通常是UTF-8(但不总是这样—请参阅下面的详细信息)。

在编写Python程序时，在接口的最远边界处对Unicode数据进行编码和解码是很重要的;这种方法通常称为Unicode sandwich .程序的核心应该使用包含Unicode数据的str类型，并且不应该假设有任何字符编码. 这种方法允许您非常接受其他文本编码(例如Latin-1、Shift JIS和Big5)，同时严格控制输出文本编码(理想情况下是UTF-8)。

字符类型之间的分隔方式导致了Python代码中两种常见的情况:
- 您希望对包含utf -8编码的字符串(或其他编码)的原始8位序列进行操作。
- 您希望对没有特定编码的Unicode字符串进行操作。
通常需要两个辅助函数来在这些情况之间进行转换，并确保输入值的类型符合代码的期望。

第一个函数接受一个字节或str实例并总是返回一个str
```
def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value  # Instance of str

print(repr(to_str(b'foo')))
print(repr(to_str('bar')))
```
第一个函数接受一个字节或str实例并总是返回一个bytes
```
def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value  # Instance of bytes

print(repr(to_bytes(b'foo')))
print(repr(to_bytes('bar')))
```
在Python中处理原始的8位值和Unicode字符串时有两个大问题
第一个问题是字节和str的工作方式似乎相同，但它们的实例彼此不兼容，所以你必须考虑你所传递的字符序列的类型。
```
>>> print(b'one'+b'two')
b'onetwo'
>>> print('one'+'two')
onetwo
```
通过使用+运算符，可以将bytes + bytes ，将str + str，但是不能 bytes + str
```
>>> print('one'+b'two')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "bytes") to str
```
通过使用二进制操作符，可以比较字节与字节以及str
```
>>> assert 'red' > 'blue'
>>> assert b'red' > b'blue'
```
但是不能直接比较 str 和 bytes
```
>>> assert b'red' > 'blue'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'bytes' and 'str'
```
比较字节和str实例是否相等总是会计算为False，即使它们包含完全相同的字符
```
>>> print(b'foo' == 'foo')
False
```
%处理各种类型的字符串格式化
```
>>> print(b'red %s'%b'blue')
b'red blue'
>>> print('red %s'%'blue')
red blue
```
但是您不能将一个str实例传递给一个字节格式字符串，因为Python不知道使用什么二进制文本编码
```
>>> print(b'red %s'%'blue')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: %b requires a bytes-like object, or an object that implements __bytes__, not 'str'
```
您可以使用%操作符将一个bytes实例传递给一个str格式字符串，但它不会执行您所期望的操作
```
>>> print('red %s'%b'blue')
red b'blue'
```
这段代码实际上在bytes实例上调用了__repr__方法(参见第75项:“使用repr字符串调试输出”)，并替换了%s，这就是为什么b'blue'在输出中仍然转义的原因。

第二个问题是涉及文件句柄的操作(由open内置函数返回)默认需要Unicode字符串，而不是原始字节
这可能会导致意想不到的失败，尤其是对熟悉python2的语法的专业人士。例如，假设我要将一些二进制数据写入文件。这个看似简单的代码会中断
```
>>> with open('data.bin', 'w') as f:
...     f.write(b'xf1xf2xf3xf4xf5')
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
TypeError: write() argument must be str, not bytes
```
异常的原因是默认以 mode('w') 文本模式而不是用 mode('wb') 二进制模式打开。文本模式期望的输入是str，二进制是bytes
```
>>> with open('data.bin', 'wb') as f:
...     f.write(b'xf1xf2')
... 
2
```
类似的在读文件也同样存在。

或者，我可以显式地为open函数指定编码参数，以确保我不会对任何特定于平台的行为感到惊讶。例如，这里我假设文件中的二进制数据实际上是编码为“cp1252”的字符串(一种遗留的Windows编码)
```
>>> with open('data.bin', 'r', encoding='cp1252') as f:
...     data = f.read()
... 
>>> data
'ñò'
```
异常消失了，文件内容的字符串解释与读取原始字节时返回的内容非常不同

这里的教训是，您应该检查系统上的默认编码(使用python3 -c 'import locale; print(locale.getpreferredencoding())')来理解它与您的期望有什么不同。

当有疑问时，您应该显式地将编码参数传递给open。

值得注意的
- 字节包含8位值的序列，str包含Unicode编码的序列
- 使用辅助函数确保操作的输入是预期的字符序列类型(8位值、utf -8编码的字符串、Unicode编码等)。
- 能将字节和str实例与操作符(如>、==、+和%)一起使用。
- 如果你想从一个文件读写二进制数据，总是使用二进制模式打开文件(比如'rb'或'wb')。
- 如果您想在文件中读取或写入Unicode数据，请注意系统的默认文本编码.如果希望避免意外，则显式地将编码参数传递给open
相关阅读:
HDU 3579 Hello Kiki 中国剩余定理
 DHU 1788 Chinese remainder theorem again 中国剩余定理
 初学--求解模线性方程组(中国余数定理)。
山东第四届省赛： Boring Counting 线段树
 山东第四届省赛C题： A^X mod P
福州大学oj 1752 A^B mod C ===>数论的基本功。位运用。五星*****
HDU 1576 A/B 暴力也能过。扩展欧几里得
 POJ 1061 青蛙的约会扩展欧几里德--解不定方程
 HDU 2669 Romantic 扩展欧几里德---->解不定方程
 南阳nyoj 56 阶乘因式分解（一）
原文地址：https://www.cnblogs.com/zyl007/p/13027856.html

Item 3: Know the Differences Between bytes and str(了解字节和str之间的区别)

值得注意的