• [转]Python文件操作


    前言

    这里的“文件”不单单指磁盘上的普通文件,也指代任何抽象层面上的文件。例如:通过URL打开一个Web页面“文件”,Unix系统下进程间通讯也是通过抽象的进程“文件”进行的。由于使用了统一的接口,从而统一了各种抽象类型及非抽象类型文件的操作方式。

    文件操作的重要性无需多言,要想将计算机运算的结果以一定形式保存下来,文件是必须的。

    一、内建函数open()和file()

    open用法:    file_object = open(file_name, access_mode = 'r', buffering = -1)

    • 返回的文件对象file_object是可迭代的,可以用for循环遍历

    • file_name是要打开文件的文件名,非当前目录下的文件要指明路径

    • access_mode是文件读取模式,默认为只读(r),这里的用法和C中的fopen是一样的

    • buffering用来设定缓冲模式,默认值为-1,即使用系统默认缓冲模式

    • 如果文件打开失败,则会返回IOError异常

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    #路径也可以写为 r'D:Test.txt'                                                                                                                                                                                                                                                         
    >>> file_obj = open('D:\Test.txt') #使用默认access_mode                                                                                                                                                                                                                                                         
                                        #和默认buffering值                                                                                                                                                                                                                                                         
    >>> for line in file_obj:  #file_obj可迭代                                                                                                                                                                                                                                                         
        print line                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
    This is line1                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
    I'm line2                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
    Hello, World!

    参数详细说明:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    #文件对象访问模式                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
    r  只读方式打开           rU  读方式打开,提供通用换行符支持                                                                                                                                                                                                                                                        
    w  以写方式打开           a   以追加模式打开                                                                                                                                                                                                                                                        
    r+ 以读写模式打开         w+  以读写模式打开                                                                                                                                                                                                                                                        
    rb 以二进制读模式打开     wb  以二进制写模式打开                                                                                                                                                                                                                                                        
    ab 以二进制追加模式打开   rb+ 以二进制读写模式打开                                                                                                                                                                                                                                                        
    wb+以二进制读写模式打开   ab+ 以二进制读写模式打开                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
    #以r模式打开时,文件必须存在,否则引发错误                                                                                                                                                                                                                                                        
    #以w模式打开文件时,如果文件存在则清空,不存在则创建                                                                                                                                                                                                                                                        
    #以a模式打开时,如果文件存在,则从EOF位置追加,否则创建新文件                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
    #buffering模式                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
    0        不缓冲                                                                                                                                                                                                                                                        
    1        只缓冲一行                                                                                                                                                                                                                                                        
    value>1  缓冲区大小为value                                                                                                                                                                                                                                                        
    value<0  使用系统默认缓冲机制

    内建工厂函数file()

    file( )和open( )具有相同的功能和用法,可以任意替换。一般建议使用open()。

    通用换行符支持(UNS)

    在不同的系统平台上,换行符是不同的,例如unix下是 ,而windows下是 ,这就是为什么有时候你从网络上下载的txt小说不能自动换行的原因,这时候不要用记事本打开,使用windows自带的写字板或者浏览器打开再处理一下即可。

    Python中为了解决这个问题,使用了通用换行符支持(Universal NEWLINE Support),使用带U模式的open打开文件时,Python会使用“ ”作为通用换行符,从而屏蔽了不同平台下的换行符差异。

    这里不得不说的是,跨平台开发会遇到一些不可避免的问题,不同平台下的换行符差异及路径分隔符差异等等就是一个特例。幸运的是,Python的os模块提供了一些属性值以便于跨平台应用的开发:

    os.linesep      (当前系统下,下同)用于在文件中分隔行的字符串

    os.sep             用于分隔文件路径名的字符串

    os.pathsep      用于分隔文件路径的字符串

    os.curdir         当前工作目录的字符串名称,windows下是“.”

    os.pardir        当前工作目录父目录的字符串名称,windows下是“..”

    二、文件的内建方法、内建属性

    输入

    file_obj.read(size)  读取文件中的字节到字符串中,最多读取size个字节,如果没有给定size或者size为负数,文件将被读取直至末尾。这个函数不推荐使用。

    file_obj.readline(size)  读取文件中size个字节,返回一个字符串。如果没有给定size或者size为负数则返回一行(包括行结束符)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    >>> FILE2 = open('README.txt', 'rU')                                                                                                                                                                                                         
    >>> FILE2.readline()                                                                                                                                                                                                         
    'This is Python version 2.7.1 '
    >>> FILE2.readline()                                                                                                                                                                                                         
    '============================ '
    >>> FILE2.readline()                                                                                                                                                                                                         
    ' '
    >>> FILE2.readline()                                                                                                                                                                                                         
    'Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 '

    file_obj.readlines()  读取剩余的所有的行(文件指针不一定在开始位置!)并将其以一个字符串列表形式返回。此方法一次性读取文件所有的内容至内存中,适用于小型文件。对于大型文件,一般用迭代的方式读入内存,一会会讲到。

    输出

    file_obj.write()  write()方法和read及readline相反,它把含有文本数据或者二进制数据块的字符串写入到文件中。

    file_obj.writelines()  和readlines()相反,它接收一个字符串列表作为参数并将其写入到文件中,每个字符串的行结束符不会被自动写入。需要的话,需要你自行添加。

    移动文件指针

    file_obj.seek( offset[,whence] )  seek()方法类似于C中的fseek,offset为偏移量,whence代表相对位置,是一个可选参数,默认为0(文件开头)。1代表当前位置,2代表文件末尾。

    file_obj.tell( )  类似于C中的ftell,返回当前文件指针相对于文件头的位置。

    1
    2
    3
    4
    5
    6
    7
    8
    >>> FILE = open('README.txt')                                                                                                                                                                   
    >>> FILE.readline()                                                                                                                                                                   
    'This is Python version 2.7.1 '
    >>> FILE.tell()                                                                                                                                                                   
    30L
    >>> FILE.seek(-5, 1#从当前位置后退5个字节                                                                                                                                                                 
    >>> FILE.read(5)                                                                                                                                                                   
    '7.1 ='

    文件迭代

    1
    2
    3
    4
    5
    6
    7
    #方法1 使用文件对象的readline()方法                                                                                                                                                           
    for lineItem in file_obj.readline():                                                                                                                                                           
        process lineItem                                                                                                                                                           
                                                                                                                                                                                                                                                                                                                              
    #方法2 直接迭代文件对象                                                                                                                                                           
    for lineItem in file_obj:                                                                                                                                                           
        process lineItem

    注意,因文件对象本身就是可迭代的,所以使用方法2明显要优于方法1

    采用迭代方式读取超大型文件或者网络流文件的好处是显而易见的,避免了一次性将大型文件读入内存所带来的负担(有时候甚至是不可能的,例如网络流文件)。对于小型文件还是一次性读入好些,可以尽快释放文件资源。

    完美的终结

    一切结束,千万不要忘记调用file_obj.close()以终结对文件的访问。如果你不这样显式的关闭文件,有可能丢失输出缓冲区内的数据。

    一些其它方法

    file_obj.fileno()  返回打开文件的文件描述符,这是一个整形,可以用于os模块的一些底层操作

    file_obj.flush()   把输出缓冲区内的数据立即写入文件,调用close时会自动调用这个方法。

    file_obj.isatty()  当文件是一个tty设备时,返回True。tty是字符型终端设备,例如老式的打印机以及操作系统中的终端(Terminal)程序。

    file_obj.next()   返回文件的下一行,类似于readline方法,没有其它行时引发StopIteration异常

    文件的内建属性

    file.closed            如果文件已关闭,返回True

    file.encodeing     文件所使用的编码

    file.mode             文件打开时的模式

    file.name             文件名

    file.newlines        文件中用到的行分隔符,只有一种时为字符串,多于一种时是一个列表

    file.softspace       这个属性程序员一般用不着,感兴趣的话请help(file.softspace)

    三、标准文件、命令行参数

    标准文件

    一般来说,程序执行后,你可以访问3个标准文件。它们是标准输入(一般是键盘)、标准输出(到显示器的缓冲输出)、标准错误(到屏幕的非缓冲输出)。这些文件沿用了C中的命名:stdin,stdout,stderr。3个标准文件会被操作系统预先打开,因此,你只需要知道它们的文件句柄就可以随时访问它们。

    1
    2
    3
    4
    5
    >>> import sys                                                                 
    >>> sys.stdout.write('hello, This works like print')                                                                 
    hello, This works like print
    >>> sys.stderr.write('wahoo, This is a err I created!')                                                                 
    wahoo, This is a err I created!    #这行在IDLE中会显示为错误的红色

    命令行参数

    sys模块的argv属性是命令行参数组成的列表(Python中没有C中的argc),使用参照下例:

    1
    2
    3
    4
    5
    6
    7
    from sys import argv                                                       
                                                                                                                      
    numOfArg = len(argv)                                                       
                                                                                                                      
    print 'There are total %d arguments.' % numOfArg                                                       
    for num in range(numOfArg):                                                       
        print 'Arg No.%d: %s' % (num + 1, argv[num])

    保存为arg.py,在windows的命令终端下运行效果如下:


    四、文件系统

    Python的os模块实现了操作文件系统的接口。这些操作包括遍历目录树,删除/重命名文件等。此外os.path模块可以实现一些针对路径名的操作。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    #---------os模块的函数---------                                    
                                                                                
    #文件处理                                    
    mkfifo()/mknod()     创建命名管道/文件系统节点                                    
    remove()/unlink()    删除文件                                    
    rename()/renames()   重命名文件                                    
    stat()               返回文件信息                                    
    symlink()            创建符号链接                                    
    utime()              更新时间戳                                    
    tmpfile()            创建并打开('w+b')一个新的临时文件                                    
    walk()               生成一个目录树下的所有文件名                                    
                                                                                
    #目录/文件夹                                    
    chdir()/fcdir()      改变当前工作目录/通过一个文件描述符改变当前目录                                    
    chroot()             改变当前进程的根目录                                    
    listdir()            列出指定目录的文件                                    
    getcwd()/getcwdu()   返回当前工作目录/功能相同,但返回Unicode对象                                    
    mkdir()/mkdirs()     创建目录/创建多层目录                                    
    rmdir()/removedirs() 删除目录/删除多层目录                                    
                                                                                
    #访问/权限                                    
    access()             检验权限模式                                    
    chmod()              改变权限模式                                    
    chown()/lchown()     改变owner和group ID/功能相同,但不会跟踪链接                                    
    umask()              设置默认权限模式                                    
                                                                                
    #文件描述符操作                                    
    open()               底层的操作系统open(对于文件,使用标准的内建open)                                    
    read()/write()       根据文件描述符进行读写操作                                    
    dup()/dup2()         复制文件描述符号/功能相同,但复制到另一个文件描述符                                    
                                                                                
    #设备号                                    
    makedev()            从major和minor设备号创建一个原始设备号                                    
    major()/minor()      从原始设备号获得major/minor设备号                                    
                                                                                
    #---------os.path模块中的路径名访问函数---------                                    
                                                                                
    #分隔                                    
    basename()           去掉目录路径,返回文件名                                    
    dirname()            去掉文件名,返回目录路径                                    
    join()               将分隔的部分组合成路径                                    
    split()              返回(dirname(), basename())元组                                    
    splitdrive()         返回(drivename, pathname)元组                                    
    splitext()           返回(filename, extension)元组                                    
                                                                                
    #信息                                    
    getatime()           返回最近访问时间                                    
    getctime()           返回文件创建时间                                    
    getmtime()           返回最近文件修改时间                                    
    getsize()            返回文件大小                                    
                                                                                
    #查询                                    
    exists()             指定路径(文件或者目录)是否存在                                    
    isabs()              指定路径是否为绝对路径                                    
    isdir()              指定路径是否存在且是一个目录                                    
    isfile()             指定路径是否存在且是一个文件                                    
    islink()             指定路径是否存在且是一个符号链接                                    
    ismount()            指定路径是否存在且是一个挂载点                                    
    samefile()           两个路径名是否指向同一个文件

    除了以上列出的功能外,还可以利用os模块提供的功能进行系统进程管理以及进程间通讯,在后面的学习中会逐渐讲到。

    五、一些相关模块

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    #文件相关模块                            
                                                                
    base64           提供二进制字符串和文本字符串之间的编解码操作                            
    binascii         提供二进制和ASCII编码的二进制字符串间编解码操作                            
    bz2              访问BZ2格式的压缩文件                            
    csv              访问逗号分隔文件(csv)                            
    filecmp          用于比较目录和文件                            
    fileinput        提供多个文本文件的行迭代器                            
    getopt/optparse  提供了命令行参数的解析处理                            
    glob/fnmatch     提供了Unix样式的通配符匹配功能                            
    gzip/zlib        读写GNU zip文件(压缩需要zlib模块)                            
    shutil           提供高级文件访问功能                            
    c/StringIO       对字符串对象提供类文件接口                            
    tarfile          读写TAR归档文件,支持压缩文件                            
    tempfile         创建一个临时文件                            
    uu               uu格式的编解码                            
    zipfile          用于读取zip归档文件的工具

    如果需要的话,可以浏览Python的参考文档查询这些模块的用法。或者import后再help。

    自己可以试着写一些文本处理,特别是模版分析处理程序,试着把字符串处理和文件操作结合起来,可别小看这些功能,编码起来并不容易,需要十分的细致和耐心。当然,这是很好的编码能力锻炼方法。

    要想对文本处理和文件操作有深入了解,如打包、压缩等,请参考《Python Cookbook》。

    原文地址:http://greenlcat.diandian.com/post/2012-10-19/40039196726

  • 相关阅读:
    软件工程课堂二
    软件工程第二周总结
    软件工程第一周开课博客
    软件工程课堂一
    开学第一次考试感想
    以Function构造函数方式声明函数
    document.scrollingElement
    标识符
    变量声明语句的提升
    用that代替this
  • 原文地址:https://www.cnblogs.com/I-Tegulia/p/4548363.html
Copyright © 2020-2023  润新知