• 【Python网页分析】httplib库的重定向处理


    1. 网页处理

    下图是实际操作抓包分析结果,其他的步骤不再描述。

    1、从选定的POST /main.aspx开始

    2、后面服务器回复302重定向到/cd_chose.aspx页面

    3、抓包数据有GET重定向URL,GET css和js文件不再赘述

    4、POST到/cd_chose.aspx

    image

    2. Python模拟

    2.1 抓包分析,后面的GET方法发送不去

    image

    再查看IE上抓包结果

    image

    没有出现GET方法

    image

    怀疑是需要直接POST,尝试了之后仍然失败,但仔细看了下POST内容,头里面有GET头,由于不太了解IE的头显示,不再深究。

    image

    2.2 检查消息格式

    由于GET这个重定向页面之前定义了HTTP头,

    image

    对比网页上实际操作成功发送的头,发现我在Python中多定义了一个头”Content-Type",主要是前面的POST方法需要和头

    实际流程里面,前面其他GET消息需要这个头,但本消息中确实不需要这个头。

    image

    去掉这个头

    查看Python的消息流程正常

    image

    这个问题由于自己http基础不踏实,遇到问题不太确定方向,总觉得重定向流程有什么其他的复杂处理。耽搁了很多时间,

    结果其实就只是一个头的问题。

    最后附上封装的http get和post方法,调用的httplib库,比较灵活方便,可以根据前端js代码,模仿自己生成一些特殊字段认证服务器。

    def http_get(self,connDefault=None,url='',bodyFlag=False,refererFresh=False,referer = ''):

            status,infor = 1,''       
            if connDefault is None:
                conn = HTTPConnection(self.host,timeout=60)
            else:
                conn = connDefault

            try:

                print 'http_get -> enter to get ',url
                start = time.time()           
               
                print 'http_get -> connect init OK'
                conn.request('GET',url,headers=self.headers)

                print 'http_get -> wait the  response...'
                response = conn.getresponse()
                end = time.time()
                print "http_get -> info:",end - start,response.status

                print 'http_get -> response headers' ,response.getheaders()

                #状态码
                status = response.status
                if status != 200:
                    print 'http_get -> http status error',status
                    infor = 'error'

                else:
                    #获取Cookie,格式如下ASP.NET_SessionId=pzt0bs55tc2fjrbv0canht45; path=/; HttpOnly
                    cookie=response.getheader('Set-Cookie','')
                    #print "http_get -> cookie -> ",cookie

                    """
                    Cookie叠加
                    """
                    if cookie != '':
                        #cookie键值分两种类型
                        print 'http_get -> peer Set-Cookie'  , cookie
                        pattern = re.compile(r'(key=[w=+/]+;|ASP.NET_SessionId=[w=+/]+;)')
                        _list = pattern.search(cookie)
                        #print 'http_get -> _list',_list   
                        if _list is not None:
                            #print 'http_get -> _list' ,url,_list.groups()
                            oCookie = self.headers.get('Cookie','')
                            if oCookie == '':
                                self.headers["Cookie"] = str(_list.groups()[0][:-1])
                            else:
                                self.headers["Cookie"] = oCookie + ';'  + str(_list.groups()[0][:-1])
                            print 'http_get -> request Cookie' ,self.headers["Cookie"]
                        else:
                            pass
                    else:
                        pass

                    """
                    更新Referer
                    """

                    if refererFresh:
                        if referer != '':
                            self.headers["Referer"] = "http://" + self.host + referer
                        else:
                            self.headers["Referer"] = "http://" + self.host + url


                    #获取编码格式,gzip编码会在头中显示定义
                    content_encoding = response.getheader('Content-Encoding','')
                    if bodyFlag:
                        """
                        gzip解码
                        """
                        if content_encoding == 'gzip':
                            buf = StringIO(response.read())
                            infor = GzipFile(fileobj=buf).read()
                        else:
                            infor = response.read()

            except Exception,ex:
                print 'http_get -> error:',ex
                status,infor = 1,ex
            finally:
                if connDefault is None:
                    conn.close()
                return status,infor


        def http_post(self,connDefault=None,url='',PostStr=''):
            status,response = 1,''
            try:
                headers = deepcopy(self.headers)
                headers["Content-Type"] ="application/x-www-form-urlencoded"
                start = time.time()
                if connDefault is None:
                    conn = HTTPConnection(self.host,timeout=60)
                else:
                    conn = connDefault

                headers["Content-Length"] = len(PostStr)
                conn.request('POST',url,PostStr,headers=headers)
                response = conn.getresponse()
                end = time.time()
                print "http_post info:",end - start,response.status
               
                #重定向
                if response.status == 302:
                    Location=response.getheader('Location','')
                    status,response = 302,Location
                #正常提交
                elif response.status == 200:
                    status,response = 200,''
                else:
                    status,response = response.status,'does not support'
            except Exception,ex:
                print 'http_post -> error:',ex
                status,response = 1,ex
            finally:
                if connDefault is None:
                    conn.close()
                return status,response

    好记性不如烂笔头
  • 相关阅读:
    PhpStudy升级MySQL5.7
    C# 字符串操作详解
    字符串留用与字符串池
    C# 字符串操作基本过程(Equals、Compare、EndsWith等处理方法)
    CLR关于语言文化的类型一CultureInfo类和字符串与线程的关联
    字符、字符串和文本的处理之Char类型
    二、LINQ之查询表达式基础
    一、Linq简介
    Unity学习系列一简介
    C# 静态构造函数
  • 原文地址:https://www.cnblogs.com/inns/p/5596689.html
Copyright © 2020-2023  润新知