• Python之FTP多线程下载文件之多线程分块下载文件


    Python中的ftplib模块用于对FTP的相关操作,常见的如下载,上传等。使用python从FTP下载较大的文件时,往往比较耗时,如何提高从FTP下载文件的速度呢?多线程粉墨登场,本文给大家分享我的多线程下载代码,需要用到的python主要模块包括:ftplib和threading。

    首先讨论我们的下载思路,示意如下:

    1. 将文件分块,比如我们打算采用20个线程去下载同一个文件,则需要将文件以二进制方式打开,平均分成20块,然后分别启用一个线程去下载一个块:

     1 def setupThreads(self, filePath, localFilePath, threadNumber = 20):
     2     """
     3     set up the threads which will be used to download images
     4     list of threads will be returned if success, else
     5     None will be returned
     6     """
     7     try:
     8         temp = self.ftp.sendcmd('SIZE ' + filePath)
     9         remoteFileSize = int(string.split(temp)[1])
    10         blockSize = remoteFileSize / threadNumber
    11         rest = None
    12         threads = []
    13         for i in range(0, threadNumber - 1):
    14             beginPoint = blockSize * i
    15             subThread = threading.Thread(target = self.downloadFileMultiThreads, args = (i, filePath, localFilePath, beginPoint, blockSize, rest,))
    16             threads.append(subThread)
    17             
    18         assigned = blockSize * threadNumber
    19         unassigned = remoteFileSize - assigned
    20         lastBlockSize = blockSize + unassigned
    21         beginPoint = blockSize * (threadNumber - 1)
    22         subThread = threading.Thread(target = self.downloadFileMultiThreads, args = (threadNumber - 1, filePath, localFilePath, beginPoint, lastBlockSize, rest,))
    23         threads.append(subThread)
    24         return threads
    25     except Exception, diag:
    26         self.recordLog(str(diag), 'error')
    27         return None

    其中的downloadFileMultiThreads函数如下:

     1 def downloadFileMultiThreads(self, threadIndex, remoteFilePath, localFilePath, 
     2                                  beginPoint, blockSize, rest = None):
     3     """
     4     A sub thread used to download file
     5     """
     6     try:
     7         threadName = threading.currentThread().getName()
     8         # temp local file
     9         fp = open(localFilePath + '.part.' + str(threadIndex), 'wb')
    10         callback = fp.write
    11         
    12         # another connection to ftp server, change to path, and set binary mode
    13         myFtp = FTP(self.host, self.user, self.passwd)
    14         myFtp.cwd(os.path.dirname(remoteFilePath))
    15         myFtp.voidcmd('TYPE I')
    16         
    17         finishedSize = 0
    18         # where to begin downloading
    19         setBeginPoint = 'REST ' + str(beginPoint)
    20         myFtp.sendcmd(setBeginPoint)
    21         # begin to download
    22         beginToDownload = 'RETR ' + os.path.basename(remoteFilePath)
    23         connection = myFtp.transfercmd(beginToDownload, rest)
    24         readSize = self.fixBlockSize
    25         while 1:
    26             if blockSize > 0:
    27                 remainedSize = blockSize - finishedSize
    28                 if remainedSize > self.fixBlockSize:
    29                     readSize = self.fixBlockSize
    30                 else:
    31                     readSize = remainedSize
    32             data = connection.recv(readSize)
    33             if not data:
    34                 break
    35             finishedSize = finishedSize + len(data)
    36             # make sure the finished data no more than blockSize
    37             if finishedSize == blockSize:
    38                 callback(data)
    39                 break
    40             callback(data)
    41         connection.close()
    42         fp.close()
    43         myFtp.quit()
    44         return True
    45     except Exception, diag:
    46         return False

    2. 等待下载完成之后我们需要对各个文件块进行合并,合并的过程见本系列之二:Python之FTP多线程下载文件之分块多线程文件合并

    感谢大家的阅读,希望能够帮到大家!

    Published by Windows Live Writer!

  • 相关阅读:
    git让线上代码强制覆盖本地的
    redis连接时报错:Could not connect to Redis at 127.0.0.1:6379: Connection refused
    Apache使用内置插件mod_php解析php的配置
    Apache2.4+PHP7.2配置站点访问变下载
    Linux下查看某一进程所占用内存的方法
    SNMP监控一些常用OID的总结
    kafka 生产消费原理详解
    HttpServletRequest接收参数的几种方法
    【转载】idea 2018注册码(激活码)永久性的
    SecureCRT & SecureFx 绿色破解版
  • 原文地址:https://www.cnblogs.com/berlin-sun/p/Multi-threadingDownloadviaFTPwithPython.html
Copyright © 2020-2023  润新知