• 【转】【Python】python使用urlopen/urlretrieve下载文件时出现403 forbidden的解决方法


    第一:urlopen出现403

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    import urllib
     
    url = "http://www.google.com/translate_a/t?client=t&sl=zh-CN&tl=en&q=%E7%94%B7%E5%AD%A9"
    #浏览器头
    headers = {'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
    req = urllib2.Request(url=url,headers=headers)
    data = urllib.request.urlopen(req).read()
    print data

    二:urlretrieve 出现403(转载自:https://www.213.name/archives/1087/comment-page-1

    出现该错误的原因是服务器开启了反爬虫,一般情况下只需要设置header模拟浏览器即可,但是urlretrieve并未提供header参数。

    使用urlopen也可以直接下载文件,例

    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36"}
    def down_pic(url, path):
        try:
            req = request.Request(url, headers=headers)
            data = request.urlopen(req).read()
            with open(path, 'wb') as f:
                f.write(data)
                f.close()
        except Exception as e:
            print(str(e))

    还有一种解决方法:

    opener=urllib.request.build_opener()
    opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
    urllib.request.install_opener(opener)
    urllib.request.urlretrieve(url, Path)

    另外附上一些uses_agents:

     user_agents = [
                        'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11',
                        'Opera/9.25 (Windows NT 5.1; U; en)',
                        'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
                        'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)',
                        'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12',
                        'Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9',
                        "Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Ubuntu/11.04 Chromium/16.0.912.77 Chrome/16.0.912.77 Safari/535.7",
                        "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0 ",
     
                        ]

    原文地址:https://blog.csdn.net/qq_34309753/article/details/81502529

  • 相关阅读:
    centos7防火墙那些事
    CentOS7安装mysql数据库
    git回滚到任意版本
    SQL Server查看所有表大小、表行数和占用空间信息
    windows地址转发
    Apache和tomcat服务器使用ajp_proxy模块
    jdk分析工具:jps和jstack
    centos下linux运行asp网站搭建配置-mono+nginx
    reader
    solr课程学习系列-solr服务器配置(2)
  • 原文地址:https://www.cnblogs.com/mqxs/p/9978179.html
Copyright © 2020-2023  润新知