• 用Python实现大文件分割


    python代码如下:

    import sys,os
    
    kilobytes = 1024
    megabytes = kilobytes*1000
    chunksize = int(200*megabytes)#default chunksize
    
    def split(fromfile,todir,chunksize=chunksize):
        if not os.path.exists(todir):#check whether todir exists or not
            os.mkdir(todir)          
        else:
            for fname in os.listdir(todir):
                os.remove(os.path.join(todir,fname))
        partnum = 0
        inputfile = open(fromfile,'rb')#open the fromfile
        while True:
            chunk = inputfile.read(chunksize)
            if not chunk:             #check the chunk is empty
                break
            partnum += 1
            filename = os.path.join(todir,('data%04d'%partnum))
            fileobj = open(filename,'wb')#make partfile
            fileobj.write(chunk)         #write data into partfile
            fileobj.close()
        return partnum
    if __name__=='__main__':
            fromfile  = input('File to be split?')
            todir     = input('Directory to store part files?')
            chunksize = int(input('Chunksize to be split?'))
            absfrom,absto = map(os.path.abspath,[fromfile,todir])
            print('Splitting',absfrom,'to',absto,'by',chunksize)
            try:
                parts = split(fromfile,todir,chunksize)
            except:
                print('Error during split:')
                print(sys.exc_info()[0],sys.exc_info()[1])
            else:
                print('split finished:',parts,'parts are in',absto)

    以data.txt文件为例,此文件是由python随机生成的数字构成的数据集,大小为1.1G,现将它等分割成多个128M子文件,运行结果如下:

  • 相关阅读:
    第二次结对编程作业
    第5组 团队展示
    第一次结对编程作业
    BETA 版冲刺前准备(团队)
    项目测评(团队)
    1111111111
    Alpha 事后诸葛亮
    Alpha 冲刺 (10/10)
    Alpha 冲刺 (9/10)
    Alpha 冲刺 (8/10)
  • 原文地址:https://www.cnblogs.com/lijinze-tsinghua/p/10000903.html
Copyright © 2020-2023  润新知