• Python3网络学习案例四:编写Web Proxy


    代理服务器的定义和作用请走百度百科~

    1. Web Proxy的实现思路

    这是基于上一篇“编写Web Server”写的,主要逻辑见下图:

     我们要写的就是中间的Web Proxy部分,当客户端向Web Proxy发送对某一个网址的访问请求(Request)时,Web Proxy会首先查看自己是否有该请求文件,如果有则直接返回(Response),如果没有,Web Proxy就要像Web Server(该访问网址的服务器)发送请求来获取目标文件,然后再向Client返回。

    2. Web Proxy的使用

    首先,我们在访问一个网址时为了通过代理访问就不能简单地打开浏览器输入网址进行访问(那样就变成Client直接向Web Server发送Reuest了),在这里可以下载一个名为Wget的工具,这个东西对于Web Proxy就好像是jdk对于Java一样(当然也许有其他的工具可以先访问代理服务器,这里不讨论),下载完成后可以解压就可以使用了,就像使用jdk一样首先在命令行窗口中找到该文件所在文件夹,如果不想每次都输入一串目录来查找的话也可以将这个文件的路径添加至环境变量(至于如何配置自行搜索)。

    当Web Proxy和Wget都准备好之后就可以开始运行了:

    首先运行Web Proxy程序,然后通过Wget请求使用代理并且发送Request

    (Wget命令:wget xxx.xxx.xx -e use_proxy=on -e http_proxy=127.0.0.1:8000),其中“xxx.xxx.xx”就是你要请求的网址

    3. 运行结果

    wget关于请求的回应:

     proxy缓存的路径:

      

     4. Web Proxy源码

    import os
    import socket
    
    
    def handleReq(clientSocket):
        # recv data
        # find the fileName
        # judge if the file named "fileName" if existed
        # if not exists, send req to get it
    
        recvData = clientSocket.recv(1024).decode("UTF-8")
        fileName = recvData.split()[1].split("//")[1].replace('/', '')
        print("fileName: " + fileName)
        filePath = "./" + fileName.split(":")[0].replace('.', '_')
        try:
            file = open(filePath + "./index.html", 'rb')
            print("File is found in proxy server.")
            #responseMsg = file.readlines()
            #for i in range(0, len(responseMsg)):
               # clientSocket.sendall(responseMsg[i])
            responseMsg = file.read()
            clientSocket.sendall(responseMsg)
            print("Send, done.")
        except:
            print("File is not exist.
    Send request to server...")
            try:
                proxyClientSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                serverName = fileName.split(":")[0]
                proxyClientSocket.connect((serverName, 80))
                proxyClientSocket.sendall(recvData.encode("UTF-8"))
                responseMsg = proxyClientSocket.recv(4069)
                print("File is found in server.")
                clientSocket.sendall(responseMsg)
                print("Send, done.")
                # cache
                if not os.path.exists(filePath):
                    os.makedirs(filePath)
                cache = open(filePath + "./index.html", 'w')
                cache.writelines(responseMsg.decode("UTF-8").replace('
    ', '
    '))
                cache.close()
                print("Cache, done.")
            except:
                print("Connect timeout.")
    
    
    def startProxy(port):
        proxyServerSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        proxyServerSocket.bind(("", port))
        proxyServerSocket.listen(0)
        while True:
            try:
                print("Proxy is waiting for connecting...")
                clientSocket, addr = proxyServerSocket.accept()
                print("Connect established")
                handleReq(clientSocket)
                clientSocket.close()
            except Exception as e:
                print("error: {0}".format(e))
                break
        proxyServerSocket.close()
    
    
    if __name__ == '__main__':
        while True:
            try:
                port = int(input("choose a port number over 1024:"))
            except ValueError:
                print("Please input an integer rather than {0}".format(type(port)))
                continue
            else:
                if port <= 1024:
                    print("Please input an integer greater than 1024")
                    continue
                else:
                    break
        startProxy(port)

    5. Wget工具包

    链接:https://pan.baidu.com/s/1Ae2_Cq9SYbKnfhhyJ1VhpQ
    提取码:awsl 

  • 相关阅读:
    ERROR: Cannot set priority of registrydns process 33740
    Hbase和Phoenix部署-单机版
    ambari安装hdp时,新建的ambari-hdp-1.repo中baseurl无值
    centos7.2升级openssh到8.0
    kafka一个broker挂掉无法写入
    对cdh搭建过程错误总结及解决方法
    streamsets
    [Spark]Task not serializable
    [Kafka]How to Clean Topic data
    Postgresql Master/Slaver
  • 原文地址:https://www.cnblogs.com/YuanShiRenY/p/Python_Web_Proxy.html
Copyright © 2020-2023  润新知