(一)大文件操作
1、打开文件,关闭文件
(1)一般打开文件,关闭文件:
f = open('user.txt') f.close()
(2)用with open()语句会自动打开文件和关闭文件,自动管理上下文。
with open('user.txt',encoding='utf-8') as f :
result = f.read()
(3)读取文件流程图:
(4)数据量特别大的大文件,占用内存较大,不能一下子将文件全部内容读取,该如何进行处理
with open('user.txt',encoding='utf-8') as f: # f是文件对象,文件句柄 for line in f: # 循环文件对象,读取文件每一行 line = line.strip() # 去掉空格 if line: print(line)
(5)修改文件内容
#1、读取到文件所有内容 #2、替换new_str #3、清空原来的文件 #4、写入新的文件内容 #以上针对还是小文件内容
# 针对大文件如何修改文件内容 # 例如新建一个word.txt文件,将该文件里面的字母统一成大写字母 import os with open('words.txt') as fr, open('words_new','w') as fw: for line in fr: line = line.strip() if line: line = line.upper() fw.write(line+' ') os.remove('words.txt') os.rename('words_new.txt','words.txt')
# word.txt hello,how are you? I am fine.Thank you! Just do it!
(6)监控日志练习
例如:access.log
178.210.90.90 - - [04/Jun/2017:03:44:13 +0800] "GET /wp-includes/logo_img.php HTTP/1.0" 302 161 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221" 178.210.90.90 - - [04/Jun/2017:03:44:13 +0800] "GET /blog HTTP/1.0" 301 233 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221" 178.210.90.90 - - [04/Jun/2017:03:44:15 +0800] "GET /blog/ HTTP/1.0" 200 38278 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221" 66.249.75.29 - - [04/Jun/2017:03:45:55 +0800] "GET /bbs/forum.php?mod=forumdisplay&fid=574&filter=hot HTTP/1.1" 200 17482 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" 37.9.169.20 - - [04/Jun/2017:03:47:59 +0800] "GET /wp-admin/security.php HTTP/1.1" 302 161 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-" 37.9.169.20 - - [04/Jun/2017:03:48:01 +0800] "GET /blog HTTP/1.1" 301 233 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-" 37.9.169.20 - - [04/Jun/2017:03:48:02 +0800] "GET /blog/ HTTP/1.1" 200 38330 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-" 37.9.169.20 - - [04/Jun/2017:03:48:21 +0800] "GET /wp-admin/security.php HTTP/1.1" 302 161 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-" 37.9.169.20 - - [04/Jun/2017:03:48:21 +0800] "GET /blog HTTP/1.1" 301 233 "http://nnzhp.cn/wp-admin/security.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "-"
121.69.45.254 - - [04/Jun/2017:10:29:45 +0800] "POST /dsx/wp-admin/admin-ajax.php HTTP/1.1" 200 47 "http://www.imdsx.cn/dsx/wp-admin/post.php?post=723&action=edit" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "-"
66.249.69.65 - - [04/Jun/2017:10:30:45 +0800] "GET /bbs/forum.php?mod=guide&view=new&page=1 HTTP/1.1" 200 58386 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
121.69.45.254 - - [04/Jun/2017:10:30:45 +0800] "POST /dsx/wp-admin/admin-ajax.php HTTP/1.1" 200 147 "http://www.imdsx.cn/dsx/wp-admin/post.php?post=723&action=edit" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "-"
66.249.69.85 - - [04/Jun/2017:10:31:42 +0800] "GET /people/137/answers HTTP/1.1" 200 11563 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
121.69.45.254 - - [04/Jun/2017:10:31:45 +0800] "POST /dsx/wp-admin/admin-ajax.php HTTP/1.1" 200 47 "http://www.imdsx.cn/dsx/wp-admin/post.php?post=723&action=edit" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "-"
207.46.13.77 - - [04/Jun/2017:10:32:03 +0800] "GET /people/119/credits HTTP/1.1" 200 13453 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" "-"
121.69.45.254 - - [04/Jun/2017:10:32:45 +0800] "POST /dsx/wp-admin/admin-ajax.php HTTP/1.1" 200 147 "http://www.imdsx.cn/dsx/wp-admin/post.php?post=723&action=edit" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "-"
106.39.140.161 - - [04/Jun/2017:10:33:05 +0800] "GET /blog/ HTTP/1.1" 200 38330 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0" "-"
106.39.140.161 - - [04/Jun/2017:10:33:14 +0800] "POST /blog/category/python/ HTTP/1.1" 200 26338 "http://www.nnzhp.cn/blog/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0" "-"
106.39.140.161 - - [04/Jun/2017:10:33:28 +0800] "POST /blog/category/python/ HTTP/1.1" 200 26338 "http://www.nnzhp.cn/blog/category/python/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0" "-"
106.39.140.161 - - [04/Jun/2017:10:33:35 +0800] "GET /favicon.ico HTTP/1.1" 302 161 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0" "-"
106.39.140.161 - - [04/Jun/2017:10:33:36 +0800] "GET /blog/ HTTP/1.1" 200 38330 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0" "-"
121.69.45.254 - - [04/Jun/2017:10:33:45 +0800] "POST /dsx/wp-admin/admin-ajax.php HTTP/1.1" 200 47 "http://www.imdsx.cn/dsx/wp-admin/post.php?post=723&action=edit" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "-"
106.39.140.161 - - [04/Jun/2017:10:33:50 +0800] "POST /blog/category/python/page/2/ HTTP/1.1" 200 26768 "http://www.nnzhp.cn/blog/category/python/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0" "-"
125.34.79.127 - - [04/Jun/2017:10:33:54 +0800] "POST /blog/page/4/ HTTP/1.1" 200 20332 "http://www.nnzhp.cn/blog/page/3/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36" "-"
125.34.79.127 - - [04/Jun/2017:10:34:00 +0800] "POST /blog/page/3/ HTTP/1.1" 200 26398 "http://www.nnzhp.cn/blog/page/4/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36" "-"
125.34.79.127 - - [04/Jun/2017:10:34:04 +0800] "POST /blog/2016/12/19/python%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%e4%b8%89%e6%96%87%e4%bb%b6%e6%93%8d%e4%bd%9c%e5%92%8c%e9%9b%86%e5%90%88/ HTTP/1.1" 200 77986 "http://www.nnzhp.cn/blog/page/3/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36" "-"
66.249.69.65 - - [04/Jun/2017:10:34:32 +0800] "GET /bbs/forum.php?mod=guide&view=newthread&page=8 HTTP/1.1" 200 59357 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
121.69.45.254 - - [04/Jun/2017:10:34:46 +0800] "POST /dsx/wp-admin/admin-ajax.php HTTP/1.1" 200 147 "http://www.imdsx.cn/dsx/wp-admin/post.php?post=723&action=edit" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "-"
121.69.45.254 - - [04/Jun/2017:10:35:46 +0800] "POST /dsx/wp-admin/admin-ajax.php HTTP/1.1" 200 47 "http://www.imdsx.cn/dsx/wp-admin/post.php?post=723&action=edit" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "-"
121.69.45.254 - - [04/Jun/2017:10:36:28 +0800] "POST /dsx/wp-admin/post.php HTTP/1.1" 302 0 "http://www.imdsx.cn/dsx/wp-admin/post.php?post=723&action=edit" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "-"
10.10.10.124 - - [04/Jun/2017:10:36:29 +0800] "POST /dsx/wp-cron.php?doing_wp_cron=1496543788.7288680076599121093750 HTTP/1.1" 200 0 "http://www.imdsx.cn/dsx/wp-cron.php?doing_wp_cron=1496543788.7288680076599121093750" "WordPress/4.7.5; http://www.imdsx.cn:80/dsx" "-"
121.69.45.254 - - [04/Jun/2017:10:36:30 +0800] "GET /dsx/wp-admin/post.php?post=723&action=edit&message=1 HTTP/1.1" 200 166220 "http://www.imdsx.cn/dsx/wp-admin/post.php?post=723&action=edit" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "-"
编写程序分析导图:
代码如下:
# 1、监控日志文件,找到每分钟请求大于200的IP地址,加入黑名单 import time point = 0 while True: ips = {} f = open('access.log',encoding='utf-8') f.seek(point) for line in f : line = line.strip() if line: ip = line.split()[0] if ip in ips: ips[ip] +=1 else: ips[ip] = 1 point = f.tell() f.close() for ip in ips: count = ips[ip] if count >= 200: print('要加入黑名单ip地址是:%s' % ip) time.sleep(60)
(二)列表生成式和三元表达式
1、列表生成式:Python内置的简单又很强大的可以用来创建list的生成式。
例如:
a = [1,2,3,4,5,6] b = [] # 一般方法实现: for i in a: b.append(str(i)) print(b) # 列表生成式: c = [str(i) for i in a ] d = [str(i) for i in a if i%2 != 0] print(c) print(d)
2、列表生成式的执行顺序:各语句之间是嵌套关系,左边第二个语句是最外层,依次往右进一层,左边第一条语句是最后一层。
[x*y for x in range(1,5) if x > 2 for y in range(1,4) if y <3]
3、例如循环太繁琐,列表生成式则可以用一行语句代替循环生成list
[x * x for x in range(1,11)]
4、三元表达式
三元表达式格式:条件为真时的结果 if 判断的条件 else 条件为假时的结果
适用场景:变量赋值时,要做条件判断时,简化代码时使用。
例如:
age = 18 # 一般条件判断语句 if age < 18: print( '未成年人') else: print('成年人') #三元表达式判断语句 # 第一种写法 teenager = '未成年人' if age < 18 else '成年人' print(teenager) # 第二种写法 print({True:'未成年人', False:'成年人'}[age < 18] ) # 第三种写法 print(('FalseValue', 'TrueValue')[age <18 ])
如上面的代码,第二种和第三种比较简洁,但理解起来比较不容易。
(三)集合:set(),天生去重,无序的序列。使用大括号 { } 或者 set() 函数创建集合。
1、例如一般集合实例:
# 集合 set :天生去重,无序的 l = [1,2,11,1,1,3,5,7] l2 = {1,2,3,4,5,1,1} l3 = set() #空集合 s =set(l) print(s) print(l2)
2、集合的增删改操作
(1)集合添加元素:add()
l3 = set() l3.add(1) print(l3)
(2)把集合里面的元素删除掉:remove()
l = {1,2,3,4,5,6,7} l.remove(1) print(l)
(3)把一个集合加入到另外一个集合里面
l = [1,2,11,1,1,3,5,7] l2 = {1,2,3,4,5,1,1} l3 = set() #空集合 l3.update(l2) print(l3)
3、集合的交集、并集、差集、对称差集
(1)集合的交集:集合共同存在的元素(.intersection()和符号&)
stu1 = ['Amy', 'Mike', 'Jack', 'Bob'] stu2 = ['Mary', 'Mike','Amy', 'Henry'] stu1_set = set(stu1) stu2_set = set(stu2) print(stu1_set.intersection(stu2_set)) # 取交集 print(stu1_set & stu2_set) # 取交集
实例判断密码是否合法:
import string password ='abc123' password_set = set(password) if password_set & set(string.digits) and password_set & set(string.ascii_lowercase) and password_set & set(string.ascii_uppercase): print('密码合法') else: print('不合法')
(2)集合的并集:把所有集合的元素合并到一起
# 并集,把两个集合合并到一起 s1 = {1,2,3,4} s2 = {4,5,6,7} print(s1.union(s2)) print(s1 | s2)
(3)集合的差集:在一个集合里面存在的元素,在另外一个集合不存在
# 差集 s1 = {1,2,3,4} s2 = {4,5,6,7} print(s1.difference(s2)) # 在一个集合里面存在,在另外一个集合不存在的 print(s1 - s2)
(4)对称差集:把集合合并在一起,然后去掉相同存在的元素(去掉交集元素)
# 对称差集 print(s1 ^ s2) print(s1.symmetric_difference(s2))
4、集合同时也支持循环操作
s1 = {1,2,3,4} for s in s1: print(s)
(四)json就是一个字符串,只不过是所有语言都能解析这个字符串,使用Python中需要导入json模块
(注意:字典里可以是单引号‘’或者是双引号“”,但json里只有双引号)
import json
1、json.dumps():对数据进行编码(list、tuple、dict),将Python数据结构转换为Json
d ={'name' : 'Mary', 'hobby':['reading', 'running','hiking'], 'house':(4,5,6),'addr': '北京','age': 18, 'sex': '男'} json.dumps() # python的数据类型转json的 # (list、tuple、dict) result = json.dumps(d,ensure_ascii=False,indent=4) # ensure_ascii=Falsez把Unicode转换成中文显示 ;格式化缩进:indent=数字 print(d) print(result) print(type(result))
2、json.loads():对数据进行解码,将一个Json编码的字符串转换回一个Python数据结构
json_str = '{"name": "Mary", "hobby": ["reading", "running", "hiking"], "house": [4, 5, 6]}' dict2 = json.loads(json_str) print(dict2)
3、json.load():编码Json数据,处理文件(从Json文件中读取数据)
4、json.dump():解码Json数据,处理文件(将数据类型转换成字符串,写入Json文件中)
f = '' content = f.read() d = json.loads(content) # 把content转换成了字典 json.load(f) # 帮你封装了处理文件的功能 json_str = json.dumps(d, indent=4,ensure_ascii=False) f.write(json_str) json.dump(d,f,indent=4,ensure_ascii=False)
5、实例:
# 使用json.dump()写入数据到fw with open('info.txt','w',encoding='utf-8') as fw: json.dump(d,fw,ensure_ascii=False,indent=4) # 使用json.load()读取文件数据 with open('info.txt',encoding='utf-8') as fw: d = json.load(fw) print(d) print(d.get('name')) print(d.get('hobby'))
(五)函数:组织好的,可重复使用的,用来实现单一或相关联功能的代码段。
1、定义函数:函数代码块以 def 关键词开头,后接函数名称和圆括号() ,传入参数和自变量必须放在圆括号中间,用于定义参数。return 【表达式】结束函数,不带return相当于返回None。
def hello(): # 定义函数,提高代码的复用性,被调用才执行 print('nihao')
2、调用函数,函数调用执行,例如:hello()
方法:def定义函数
实例1:
import string def check_password(password): # 校验密码是否合格,password为必传参数,位置参数 password_set = set(password) if password_set & set(string.digits) and password_set & set(string.ascii_lowercase) and password_set & set(string.ascii_uppercase): print('合法') return True else: print('不合法') return False password_result = check_password('abcA123') print(password_result)
实例2:提示用户当前时间,不需要填写必填参数,不需要返回值
import datetime def baoshi(): print('当前时间', datetime.datetime.today())
实例3:多个参数
# 定义函数 # with open('f','w') as fw: # fw.write(XXX) def write_file(file_name, content): with open(file_name,'w',encoding='utf-8') as fw: fw.write(content) # 调用函数 write_file(content='anbc',file_name='hihi.txt',) write_file('b.txt','nikdkkd') write_file('c.txt',content='abc123')
3、默认值参数
实例1:定义一个既可以写文件,又可以读文件的函数
# 默认值参数 def op_file(file_name,content=None): print(content) if content: write_file(file_name,content) else: result = read_file(file_name) return result print(op_file('b.txt')) # 不传content内容,默认显示 None op_file('b.txt','goodafternoon!') # 传content内容,文件里面会写入内容
4、函数练习:函数中遇到return立马结束进程
实例:判断小数,判断字符串是合法的小数
编程分析:
代码如下:
# 判断小数 # '1.5' def is_float(s): s = str(s) if s.count('.') == 1: left,right =s.split('.') if left.isdigit() and right.isdigit(): # 正小数 return True if left.startswith('-') and left.lstrip('-').isdigit() and right.isdigit(): return True return False print(is_float('1.5')) print(is_float('.3')) # 函数里面遇到return,函数立即结束
5、实例:函数返回值
(1)如果一个函数没有写返回值的话,返回的就是None
(2)如果函数有多个返回值,那么返回的就是一个元组
def test(): print('hello') def test2(): return 1,2,3 print(test()) print(test2()) a,b,c = test2() print(a,b,c)
(六)全局变量和局部变量
1、全局变量:允许在所有函数的外部定义变量,默认作用域是整个程序,可在函数外部和内部使用。一般定义在代码的最上面
(1)实例1:全局变量使用
country = 'China' # 全局变量 def say(): print(country) word = 'nihao' print(word)' def Amy (): country = 'Japan' # country变成了Japan,定义变量先从自己函数里,若没有再从全局变量找 print(country) say() Amy()
(2)在函数体内定义全局变量,使用global关键字对变量进行修饰,该变量就会变成全局变量:
def text(): global add add = "欲穷千里目,更上一层楼" print('函数内部访问:',add) text() print('函数外部访问:',add)
练习题1:判断最后money是多少:
money =500 def test(consume): return money - consume def test1(money): return test(money) + money money = test1(money) print(money)
练习题2:判断最后打印结果是什么:
def test(): global a a = 5 def test1(): c = a + 5 return c res = test1() print(res)
答案为:
2、局部变量:函数内部定义的变量,作用域仅限于函数内部
# 局部变量 def say(): print(country) word = 'nihao'
3、获取指定作用域范围中的变量
(1)globals()函数:Python内置函数,可以返回包含全局范围内所有变量的字典,每个键值对,键为变量名,值为该变量的值。
实例:
可以看到,通过调用 globals() 函数,可以得到一个包含所有全局变量的字典,通过字典可以访问指定变量,且修改它的值。如下图:
(2)locals()函数 :Python内置函数之一,通过调用,可以得到一个包含当前作用域内所有变量的字典。函数内部调用locals()函数,得到所有局部变量的字典,全局调用 locals() 函数,功能和globals()函数相同。
实例如图:
图中的使用locals()函数获取所有全局变量时,跟globals()函数一样,返回的字典默认好多变量,均是Python内置的函数。
(3)var(object):python内置函数,返回一个指定object对象范围内所有变量组成的字典,不传入object参数,var() 和 locals() 作用完全相同。
(七)可变参数、关键字参数
1、必传参数,也叫位置参数
2、默认值参数
3、可选参数,也叫参数组
# 可选参数,它不是必传的,不限制参数个数,它是把参数放到了一个list里面 def send(*args): for p in args: print('发短信给%s'%p) send() send(110) send(110,120,119)
4、关键字参数
# 关键字参数,非必传,不限制参数个数,它是把参数放到一个字典里面 # 但是它传参的时候必须得用关键字的方式 def send_sms(**kwargs): print(kwargs) send_sms() send_sms(Bob = 'happy noon') send_sms(Amy = 'good mornong', Mary = 'good evening', John = 'good afternoon')
5、参数顺序(必填参数,默认值参数、参数组、关键字参数)
# 1、必填参数 # 2、默认值参数 # 3、参数组 # 4、关键字参数 def or_fun(name,age,country='China',sex=''male',*args,**kwargs): # 1、必填参数 # 2、默认值参数 # 3、参数组 # 4、关键字参数 print(name) print(age) print(sex) print(country) print(args) print(kwargs) or_fun('Bob',18,'Ameirca','男','kdkdk','asjc',name =1,b=2,c=3)
6、函数传参
def xzh(name,age,sex): print(name) print(age) print(sex) l = ['xzh',18,'girl'] d = {'name':'xzh','age':18,'sex':'girl'} xzh(*l) xzh(**d)