1、open()与read()
open():打开文件;read():读取文件。
with open('a.txt', 'r') as f: data = f.read() print(data) print("file encoding:",f.encoding)
打印输出结果:
end action end action end action 2016-08-06 17:08:23 end action 2016-08-06 17:08:23 end action 2016-08-06 17:08:23 end action file encoding: cp936
f.encoding输出文件的编码格式。
2、write()、flush()与close()
write():写文件;
flush():刷新缓冲区
close():关闭文件
2.1、爬取https://www.baidu.com的数据并保存
注意:response.text的编码格式保存在response.encoding中,可以打印出来,打开baidu.txt文件时编码格式要与其一致,否则保存的是乱码。
import requests response = requests.get('https://www.baidu.com') # print(response.headers) print(type(response.text)) print(response.text.encode("ISO-8859-1").decode("utf-8")) print(response.encoding) f = open('baidu.txt', 'w',encoding='ISO-8859-1') f.write(response.text) f.flush() f.close()
2.2、读取保存的数据并显示
# 读取保存的数据并显示 with open('baidu.txt', 'r',encoding='utf-8') as f: data = f.read() print(data) print("file encoding:",f.encoding)
打印输出结果:
<!DOCTYPE html> <!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>百度一下,你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus=autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn" autofocus></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=https://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>'); </script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>使用百度前必读</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a> 京ICP证030173号 <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html> file encoding: utf-8
3、循环读取文件
f = open('baidu.txt','r', encoding='utf-8') count = 0 for line in f: if count == 3: print("--------我是分割线----------") count += 1 continue print(line) count += 1
4、readline()
4.1、用于从文件读取整行,包括 " " 字符。
f = open('baidu.txt','r', encoding='utf-8') for i in range(5): print(f.readline()) f.close()
4.2、从文件中读取指定字节长度的数据
f = open('baidu.txt','r', encoding='utf-8') data = f.readline(10) print(data) data2 = f.readline(10) print(data2) f.close()
打印输出结果:
<!DOCTYPE
html>
5、文件修改
将文件中"我"替换成"你",并写入新的文件。
import sys find_str = "我" replace_str = "你" with open("yesterday2", "r", encoding="utf-8") as f, open("yesterday2.bak", "w",encoding="utf-8") as f_new: print("old: ",f.readlines()) f.seek(0) # 移动文件指针 for line in f: if find_str in line: line = line.replace(find_str, replace_str) f_new.write(line) f_new.flush() print("*"*50) with open("yesterday2.bak", "r",encoding="utf-8") as f_new: print("new: ",f_new.readlines())
打印输出结果:
old: ['Oh, yesterday when I was young ', '噢 昨日当我年少轻狂 ', 'So many, many songs were waiting to be sung ', '有那么那么多甜美的曲儿等我歌唱 ', 'So many wild pleasures lay in store for me ', '有那么多肆意的快乐等我享受 ', 'And so much pain my eyes refused to see ', '还有那么多痛苦 我的双眼却视而不见 ', "There are so many songs in me that won't be sung ", '我有太多歌曲永远不会被唱起 ', 'I feel the bitter taste of tears upon my tongue ', '我尝到了舌尖泪水的苦涩滋味 ', 'The time has come for me to pay for yesterday ', '终于到了付出代价的时间 为了昨日 ', 'When I was young ', '当我年少轻狂 '] ************************************************** new: ['Oh, yesterday when I was young ', '噢 昨日当你年少轻狂 ', 'So many, many songs were waiting to be sung ', '有那么那么多甜美的曲儿等你歌唱 ', 'So many wild pleasures lay in store for me ', '有那么多肆意的快乐等你享受 ', 'And so much pain my eyes refused to see ', '还有那么多痛苦 你的双眼却视而不见 ', "There are so many songs in me that won't be sung ", '你有太多歌曲永远不会被唱起 ', 'I feel the bitter taste of tears upon my tongue ', '你尝到了舌尖泪水的苦涩滋味 ', 'The time has come for me to pay for yesterday ', '终于到了付出代价的时间 为了昨日 ', 'When I was young ', '当你年少轻狂 ']