一、需求
在Python中,要将正则表达式应用于匹配整个文件,但又不想将整个文件内容全部读入内存(特别是对于大文件而言,更是如此)。
二、解决方法
使用re模块进行正则匹配,使用mmap模块进行文件的内存映射。
三、举例
(来源于"Stack Overflow" -- How do I re.search or re.match on a whole file without reading it all into memory?)
You can use mmap to map the file to memory. The file contents can then be accessed like a normal string:
import re, mmap with open('/var/log/error.log', 'r+') as f: data = mmap.mmap(f.fileno(), 0) mo = re.search('error: (.*)', data) if mo: print "found error", mo.group(1)
This also works for big files, the file content is internally loaded from disk as needed.
四、参考
re模块和mmap模块的详细介绍请参见Python帮助文档。