• python 爬取整理


    请求部分

    url解析

    from urllib import parse
    url = "http://www.baidu.com/s?"
    info = {"wd":"kidd"}
    url = url + parse.urlencode(info)
    print(url) #http://www.baidu.com/s?wd=kidd

    url的编码与解码

    为何要这需要使用呢?

    如果一个请求中包含?=  / + 等特殊符号时可能会发生冲突。如果你直接 http://www.baidu.com/s?wd=/a+b=?/ 搜过内容肯定会有差别。

    from urllib import parse
    # 编码
    url = "http://www.baidu.com/s?wd="
    info = parse.quote("/a+b=?/")
    url += info
    print(url) # http://www.baidu.com/s?wd=/a%2Bb%3D%3F/
    
    # 解码
    parse_url = parse.unquote(url)
    print(parse_url) # http://www.baidu.com/s?wd=/a+b=?/

    requests好像不能实现,如果能实现麻烦告诉我。

    requests的post请求

    data数据不是字典

    data = "name=kidd"
    response = requests.post("http://httpbin.org/post",data=data)
    print(response.text)

    返回结果,放在data中

    "{
      "args": {}, 
      "data": "name=kidd", 
      "files": {}, 
      "form": {}, 
      "headers": {
        "Accept": "*/*", 
        "Accept-Encoding": "gzip, deflate", 
        "Content-Length": "9", 
        "Host": "httpbin.org", 
        "User-Agent": "python-requests/2.23.0", 
        "X-Amzn-Trace-Id": "Root=1-5edeee36-d00dd8b083c14254ec60605a"
      }, 
      "json": null, 
      "origin": "39.77.220.193", 
      "url": "http://httpbin.org/post"
    }"

    data是字典

    data = {"name":"kidd"}
    response = requests.post("http://httpbin.org/post",data=data)
    print(response.text)

    返回数据,放在form中,数据在form才算成功

    {
      "args": {}, 
      "data": "", 
      "files": {}, 
      "form": {
        "name": "kidd"
      }, 
      "headers": {
        "Accept": "*/*", 
        "Accept-Encoding": "gzip, deflate", 
        "Content-Length": "9", 
        "Content-Type": "application/x-www-form-urlencoded", 
        "Host": "httpbin.org", 
        "User-Agent": "python-requests/2.23.0", 
        "X-Amzn-Trace-Id": "Root=1-5edeeee5-f0544530bbb1b22824acd930"
      }, 
      "json": null, 
      "origin": "39.77.220.193", 
      "url": "http://httpbin.org/post"
    }
  • 相关阅读:
    CobaltStrike上线Linux主机(CrossC2)
    Active-Directory活动目录备忘录
    CVE-2020-5902 F5 BIG-IP 远程代码执行漏洞复现
    SSTI-服务端模板注入漏洞
    powershell代码混淆绕过
    绕过PowerShell执行策略方法
    "dpkg: 处理归档 /var/cache/apt/archives/libjs-jquery_3.5.1+dfsg-4_all.deb (--unpack)时出错"的解决方法
    firda安装和使用
    内网渗透-跨域攻击
    Web-Security-Learning
  • 原文地址:https://www.cnblogs.com/py-peng/p/13070837.html
Copyright © 2020-2023  润新知