python编码问题

linux、mac os黙认utf-8

windows黙认GBK

python 2

python 2黙认使用ascii码，python 2读入文件中的字符串黙认以文件声明为准，声明的是什么就以什么做为编码

GBK—decode('gbk')———》unicode-----encode('utf-8')-----》UTF-8

UTF-8-----decode('utf-8')-------->unicode-----encode('gbk')------->GBK

python3

python 3黙认使用unicode，python 3读入文件中的字符串时，不管是什么编码，都会先转换成unicode，也就是说，在python 3中使用的字符串都是unicode编码

python 3还有一种格式，称为bytes，用于存储和网络传输

requests乱码问题

如下：

import requests

#1、指定url
url = 'https://www.baidu.com'

#2、发起get请求，返回响应对象
response = requests.get(url=url)

#3、获取响应对象值 ，.text为str，content为byte
response_text = response.text

with open('./re2.html',"w",encoding="utf-8") as f:
    f.write(response_text)

以上代码写入的re2.html将出现乱码

原因：

response.text将请求的网页数据黙认以'latin1'编码decode成unicode，而网页请求过来的数据是utf-8编码格式，所以response.text得到的就是乱码

解决方案：

1、如果 Requests 检测不到正确的编码，那么你告诉它正确的是什么

import requests

#1、指定url
url = 'https://www.baidu.com'

#2、发起get请求，返回响应对象
response = requests.get(url=url)

response.encoding = 'utf-8'


print(type(response))
#3、获取响应对象值 ，.text为str，content为byte
response_text = response.text

with open('./re3.html',"w",encoding="utf-8") as f:
    f.write(response_text)

2、将错误编码的unicode数据以原来错误的decode编码重新encode成bytes格式

import requests

#1、指定url
url = 'https://www.baidu.com'

#2、发起get请求，返回响应对象
response = requests.get(url=url)

#人为指定编码格式为utf-8
# response.encoding = 'utf-8'

#3、获取响应对象值 ，.text为str，content为byte,将response.text以'latin-1'编码进行encode
response_text = response.text.encode('latin-1')

with open('./re3.html',"wb") as f:
    f.write(response_text)

3、直接使用response.content，获取bytes编码格式数据

import requests

#1、指定url
url = 'https://www.baidu.com'

#2、发起get请求，返回响应对象
response = requests.get(url=url)

#人为指定编码格式为utf-8
# response.encoding = 'utf-8'

#3、获取响应对象值 ，.text为str，content为byte
response_content = response.content

with open('./re3.html',"wb") as f:
    f.write(response_content)

相关阅读:
JMeter BeanShell示例
xpath 函数大全
XPath教程
clickhouse集群部署21.6
mysqldump常见用法（转载）
从零到一k8s(四)云原生存储Longhorn
从零到一k8s(三)dns 配置
centos matplot中文字体显示方框问题解决
python2 requests模块警告
python将多张图片显示在一张画布上

原文地址：https://www.cnblogs.com/hougang/p/code.html