• 利用cookie进行模拟登录并且抓取失败


    首先是朋友发现每次对撞md5都要上网站登录然后进行对撞,感觉好麻烦,想写一个脚本,输入md5值直接输出

    然后就上车了

    1 模拟登录

    老规矩,先要提交表单,进行抓包(我用的fiddler)进行抓包,看见了post的表单,但心血来潮,发现每次模拟登录都利用提交表单的形式好无聊,再加上前些日子写web,就想利cookie试试。

    可以看出,这个cookie中,

    CNZZDATA3819543的ntime是时间,

    user相当于session,其他都一样,所以可以写出模拟登录的脚本了

    import requests
    from bs4 import BeautifulSoup
    import time
    
    URL = 'http://www.xxx.com/'
    
    
    def get_html(url):
        session = requests.session()
        headers = {'User_Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) '
                                 'AppleWebKit/537.36 (KHTML, like Gecko) '
                                 'Chrome/30.0.1581.2 Safari/537.36'}
        cookies = {'ASP.NET_SessionId': "eqsrnjotcaj5qdf5kmrqwgpy",
                   'CNZZDATA3819543': "cnzz_eid=471312766-1484873928-&ntime=%d" % int(time.time()),
                   'FirstVisit': "",
                   '_test': "1",
                   'comefrom': "http://www.xxx.com/login.aspx",
                   'key': "",
                   'user': "kPXxHtwrSPpCMgZoXs2VrPuwuuCUrDz7dLq5R3/DBEP59eqYGYFa23AZdDPP1KDR9"
                           "rblhGp0HWbYVkOsCg3QoRwWHIQESmZi4KqRlXxfnuZcFsrEta5SwAmrrvhpNvK"
                           "ghSMRdyV7PTmKuagc7m8IZQ=="}

    返回结果,进行解析html就可以得到用户名邮箱:

    之后就可以利用session进行GET或者POST

    2 入坑,登录后的,进行md5的查询,然后抓包

    接着看表单

     

    分析表单:

    __EVENTTARGET,__EVENTARGUMENT 这两个值没什么用,每次的值都是""。

    __VIEWSTATE 这个值很有用,它是一种加密算法,结合了你查询的加密值和某些我未找到的值作为参数的加密算法(这是我没有实现爬虫的墙)

    __VIEWSTATEGENERATOR 这个从字面上理解就是上面那个viewstate的生成器,我猜它的某种加密算法(不管了,懒得看了)

    ctl00$ContentPlaceHolder1$TextBoxInput 这就是我们输入的需要解密的值

    ctl00$ContentPlaceHolder1$InputHashType 这是我们选择的它是通过了什么加密,默认好像是md5

    后面的值也什么大用。

    其实说白了只要__VIEWSTATE 和ctl00$ContentPlaceHolder1$TextBoxInput的值相对应并且匹配,那么就没问题了。

    3 最后奉献出我失败的爬虫

    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    
    import requests
    from bs4 import BeautifulSoup
    import time
    
    URL = 'http://www.xxx.com/'
    
    
    def get_html(url):
        session = requests.session()
        headers = {'User_Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) '
                                 'AppleWebKit/537.36 (KHTML, like Gecko) '
                                 'Chrome/30.0.1581.2 Safari/537.36'}
        cookies = {'ASP.NET_SessionId': "eqsrnjotcaj5qdf5kmrqwgpy",
                   'CNZZDATA3819543': "cnzz_eid=471312766-1484873928-&ntime=%d" % int(time.time()),
                   'FirstVisit': "",
                   '_test': "1",
                   'comefrom': "http://www.xxxx.com/login.aspx",
                   'key': "",
                   'user': "kPXxHtwrSPpCMgZoXs2VrPuxxuCUrDz7dLq5R3/DBEP59eqYGYFa23AZdDPP1KDR9"
                           "rblhGp0HWbYVkOsCg3QoRwWHIQESmZi4KqRlXxfnuZcFsrEta5SwAmrrvhpNvK"
                           "ghSMRdyV7PTmKuagc7m8IZQ=="}
        playloads = {'__EVENTTARGET': "",
                     '__EVENTARGUMENT': "",
                     '__VIEWSTATE': "XrQ+lfRMi82hZRL/drrjo0zDnT6/XJxrr0iphlxrVrNfVusZC2UHmQL5"
                                    "i4TbbaD8N6zKVxODMamXqkA0k7T1qoNfW9dRGs/V6mEptB90XdBB4Qj1"
                                    "n1jGG/iw+p7BW4oHPanh8mWCH3G5ZWuZM4TADQoGwOuXna0OWtVK/x8k00"
                                    "+zZEwKXi0vI2T9OrysyhkZ8msq/yashFfMyDo+Qwqb3jNJWl8n844E9Kmb4"
                                    "gcBuBmifviw7jvRJjpVQNqDH+Cbee7gMEvFK4rtKxKcCkxIGNvC46F59rl"
                                    "62EfVX81NFVSD0dhGNnF7kP0WRWpcXZRoXrxd2HFodv5beAw8Gwe7IRHr59"
                                    "T8/GmiS3KVRMDXMG9OgAg13mZv9f/LogkuNmPeiIVz9fBifx2D2kUdQQfT5x"
                                    "T0wbqoGQnWqeQcEYndUCp5lA8kCID4V8p0TR3EfrzAHPlxPh7be8yNHL8iHu"
                                    "50wgxJ6BD2W3VoeF3lOShhkpnHYAeQf7TLaCCPtKleCboctIO6dbcgt1KD6S"
                                    "UvJZyWuRRxz/CBAGNEr6piRudKOgnGl+W9nBfJDS4wl3ao3Y3Rvuon0YMz68"
                                    "o+Ef4FOExM300T51rL5HF5e8zyw+V68ISvXAoHJmhzt64j+ht0jOUzLI1UTXo"
                                    "MOg894gucdsH8VOpVNPO5F+4/03JHqi8R4cSHnFu9U9gYpnGBhIhZuzzyiLHj"
                                    "a3gqyHzehKBlWq53eOhXJH/IfVjGZ9ltjZHi9smWCMonqvZRTm0vD6nKCsQWi"
                                    "JILUzb8YrI7xzYgjHihSEyYc3qi9ze6uSwUdeJbQdqKiGVWMWt+gRxi7JZDae"
                                    "SMfN3NvavFtXdyBVyI1KFuP9LBYDYEH1RD6HXqVsblH4C1dIAq7yQnu4L20OzI"
                                    "E841MIiwLdQVAQ9aAwD3wqvPqoBJfqbkMBKQ7xSiDF+FSRacJ/IHOAJkMoqKJe4LY"
                                    "Csh0tPK1tK1pW7xF/X+PtQCQQ+Ldin76t3bpeY2KAQeF5cXEP94DIYydiJBfn4zJv+D"
                                    "QBzb0zRabwy5GBB1YDY9Fxiw34G1rB18yOlTwl2bpFnUArplpB0TwfjGkA7Up2MCrOy"
                                    "s6oDDdRn+1AQOETo7Ych274ymw+ThCzUrJeVNPf5/X2FJCJpqeH0TRCSs+0fxbaljihS9"
                                    "p3t1WqTxTHWKsh4TsZBQsn90kSItZS/dGYhNH/XUVombBi92AhUrokHqQC4b0mGdIRFRzg"
                                    "6l2lF4VfZbDfIayTgnZbT+N9RwcduCZCRWcUupLLcKnCZHuqd7WStG33dTk9IT/5q2xf57G"
                                    "fRDxslLzN1VIDn8Wtcl494OJPSPqr5+FB8mTs24UjM+6IwgVNstkJFIH1urQWl31TVUg"
                                    "nhtrIQEs4MpyeeUUwlV2CCfxP+JTGbZsuMHdd/RDwp9xH28dGQD0cikU8RlCut/XThG"
                                    "W10bPC2akAXO5xmACNBhY9XKvyMzg8D43AFa3xAxV+e9lwPhNHIQCX7c6m/t5rQztzM"
                                    "+TiraaMMGXZVyjFic757VcJHlU5We8r7lWsKBRbrqnIEV6JMi8dzmb5rLYbBbLI4N9Q"
                                    "DIwy5r0HKDmepTjhZY3DIFLkdO9RakjAoiFUs2e9h+wPxBQGQ+UbyWXzfSWa8hXKSGL"
                                    "kw774/Et5XfCPVaDBkqPPzKlX3QoV5ptuRuDCwzLdXpuBePhme64x09L9XOmIYFdaGJ"
                                    "MXjw/tKRTv6AFgGLvZyso+Ch9XLI/j5abcaLyC/nSUdsxexRPkV/wRB5pSsaau43nMn"
                                    "iMpuAVVxwryPTGnnAO38vl26BAo73jlvNvmP0Av22/3P+A2CmCcJt6S5bH7Jcw6S6HJ"
                                    "QXWDtnFGg6sYCi6mzvwmYFcBEeVzOKHJ8f7TxP7n5CbNjXWnBguSFL1UzH83DTcij6s+1lctI"
                                    "fw4NIN7NU5P+qInfSRvBH3754GAuSApuLZHOp/9k8fkkxlA==",
                     '__VIEWSTATEGENERATOR': "CA0B0334",
                     'ctl00$ContentPlaceHolder1$TextBoxInput': "21232f297a57a5a743894a0e4a801fc3",
                     'ctl00$ContentPlaceHolder1$InputHashType': "md5",
                     'ctl00$ContentPlaceHolder1$Button1': "查询",
                     'ctl00$ContentPlaceHolder1$HiddenField1': "",
                     'ctl00$ContentPlaceHolder1$HiddenField2': "gnSxKhU+42ESHE0pCcCyudmYfvxVL2+w4IhvdkwT37OI/"
                                                               "QODVV7mdVAN9puROPjh"}
    
        text = session.post(url, headers=headers, data=playloads, cookies=cookies).text
        session.close()
        return text
    
    
    def parser_html(text):
        soup = BeautifulSoup(text, 'html.parser')
        string_gen = soup.find('div', class_='main').find('table', id='table3').
            find('span', id='ctl00_ContentPlaceHolder1_LabelAnswer').strings  #strings属性返回一个生成器, 生成器返回的是一个iterable
        result = list(string_gen)[0]
        return result
    
    if __name__ == '__main__':
        text = get_html(URL)
        print parser_html(text)
    View Code
  • 相关阅读:
    图像中的傅立叶变换(二)
    图像中的傅立叶变换(一)
    最大似然估计
    论文笔记:Batch Normalization
    TensorFlow学习笔记:共享变量
    postman管理收藏夹,批量执行接口
    postman设置token关联参数,其他接口直接读取token变量
    appium+python+unittest+HTMLRunner编写UI自动化测试集
    appium自动化环境搭建(python语言开发)
    Fiddler抓取https数据包
  • 原文地址:https://www.cnblogs.com/fuzzier/p/6322847.html
Copyright © 2020-2023  润新知