• python编码问题


    linux、mac os黙认utf-8

    windows黙认GBK

    python 2

    python 2黙认使用ascii码,python 2读入文件中的字符串黙认以文件声明为准,声明的是什么就以什么做为编码

    GBK—decode('gbk')———》unicode-----encode('utf-8')-----》UTF-8

    UTF-8-----decode('utf-8')-------->unicode-----encode('gbk')------->GBK

    python3

    python 3黙认使用unicode,python 3读入文件中的字符串时,不管是什么编码,都会先转换成unicode,也就是说,在python 3中使用的字符串都是unicode编码

    python 3还有一种格式,称为bytes,用于存储和网络传输

    requests乱码问题

    如下:

    import requests
    
    #1、指定url
    url = 'https://www.baidu.com'
    
    #2、发起get请求,返回响应对象
    response = requests.get(url=url)
    
    #3、获取响应对象值 ,.text为str,content为byte
    response_text = response.text
    
    with open('./re2.html',"w",encoding="utf-8") as f:
        f.write(response_text)

    以上代码写入的re2.html将出现乱码

    原因:

    response.text将请求的网页数据黙认以'latin1'编码decode成unicode,而网页请求过来的数据是utf-8编码格式,所以response.text得到的就是乱码

    解决方案:

    1、如果 Requests 检测不到正确的编码,那么你告诉它正确的是什么

    import requests
    
    #1、指定url
    url = 'https://www.baidu.com'
    
    #2、发起get请求,返回响应对象
    response = requests.get(url=url)
    response.encoding
    = 'utf-8'

    print(type(response)) #3、获取响应对象值 ,.text为str,content为byte response_text = response.text with open('./re3.html',"w",encoding="utf-8") as f: f.write(response_text)

    2、将错误编码的unicode数据以原来错误的decode编码重新encode成bytes格式

    import requests
    
    #1、指定url
    url = 'https://www.baidu.com'
    
    #2、发起get请求,返回响应对象
    response = requests.get(url=url)
    
    #人为指定编码格式为utf-8
    # response.encoding = 'utf-8'
    
    #3、获取响应对象值 ,.text为str,content为byte,将response.text以'latin-1'编码进行encode
    response_text = response.text.encode('latin-1')
    
    with open('./re3.html',"wb") as f:
        f.write(response_text)

    3、直接使用response.content,获取bytes编码格式数据

    import requests
    
    #1、指定url
    url = 'https://www.baidu.com'
    
    #2、发起get请求,返回响应对象
    response = requests.get(url=url)
    
    #人为指定编码格式为utf-8
    # response.encoding = 'utf-8'
    
    #3、获取响应对象值 ,.text为str,content为byte
    response_content = response.content
    
    with open('./re3.html',"wb") as f:
        f.write(response_content)
  • 相关阅读:
    JMeter BeanShell示例
    xpath 函数大全
    XPath教程
    clickhouse集群部署21.6
    mysqldump常见用法(转载)
    从零到一k8s(四)云原生存储Longhorn
    从零到一k8s(三)dns 配置
    centos matplot中文字体显示方框问题解决
    python2 requests模块警告
    python将多张图片显示在一张画布上
  • 原文地址:https://www.cnblogs.com/hougang/p/code.html
Copyright © 2020-2023  润新知