• python3 爬虫入门


    import urllib.request;
    import urllib.parse;
    
    url = "http://www.iciba.com/publish";
    
    headers = {
    	"Host" : "www.iciba.com",
    	"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0",
    	"Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    	"Accept-Language" : "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
    	#"Accept-Encoding" : "gzip, deflate"
    };
    
    
    request = urllib.request.Request(url=url,headers=headers);
    
    response = urllib.request.urlopen(request);
    
    print(response.read().decode());
    

    报错:

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

    【解决之道】没有进行解压缩处理

    import urllib.request;
    import urllib.parse;
    import gzip;
    
    
    
    
    url = "https://www.baidu.com";
    headers = {
    	"Host" : "www.baidu.com",
    	"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0",
    	"Accept" : "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    	"Accept-Language" : "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
    	"Accept-Encoding" : "gzip, deflate"
    };
    
    
    request = urllib.request.Request(url=url,headers=headers);
    
    response = urllib.request.urlopen(request);
    
    
    content = response.read();
    '''
    获取响应信息
    '''
    encoding = response.info().get("Content-Encoding");
    
    
    if(encoding == "gzip"):
    	print(gzip.decompress(content).decode());
    
  • 相关阅读:
    【线程间通信:代码示例:分析问题原因:修正代码】
    【死锁问题】
    【解决线程安全问题:通过Lock锁对象】
    【解决线程安全问题:同步方法】
    【解决线程安全问题:同步代码块】
    【线程实现的两种方式及区别】
    小阳的贝壳
    小石的妹子
    SPFA模板+dfs版检测负环
    逆序对模板
  • 原文地址:https://www.cnblogs.com/liwuming/p/10851045.html
Copyright © 2020-2023  润新知