1.列表,元组,字典,集合分别如何增删改查及遍历。
(1)列表
# # -*- coding: utf-8 -*- list = ['a', 'b', 2019, 'c'] print("list的类型:", type(list)) print("list[2]:", list[2]) print("list[1:2]:", list[1:3]) print("list列表的值为", list) print("第三个元素为list[2]:", list[2]) list[2] = 2018 print("修改后的第三个元素为list[2]:", list[2]) list.append(2) print("增加元素后的list值:", list) list.insert(1, 'd') print("插入元素后的list值:", list) del list[4] print("删除第五个元素后的list值:", list)
(2)元组
# # -*- coding: utf-8 -*- tup1 = ('a', 'b', 3, 'c') tup2 = ('d', 2) print("tup1的类型为:", type(tup1)) print("输出tup1元组的值:", tup1) print("输出tup2元组的值:", tup2) tup3 = tup1+tup2 print("合起来后的新元组为:", tup3) print("输出倒数第二个数:", tup3[-5]) for i in range(len(tup3)): print(tup3[i])
(3)字典
# # -*- coding: utf-8 -*- dict = {'a': 2, 'b': 'c','name':'jack'} # 输出数据类型 print("dict的数据类型为:", type(dict)) #增加 dict['name1'] = 'mike' # 修改 print("dict['b']:", dict['b']) dict['b'] = 'd' print("dict['b']:", dict['b']) #删除 print("输出dict的值", dict) del dict['name'] print("输出删除后dict的值", dict)
(4)集合
# # -*- coding: utf-8 -*- set1 = set('a') set2 = {'c', 'd'} print("set1的数据类型:", type(set1)) print("set2的数据类型:", type(set2)) print("原set值为:", set1) set1.add("hello") print("增加后的set值:", set1) set1.remove("a") print("移除后的set值:", set1)
2.总结列表,元组,字典,集合的联系与区别。
列表 | 元组 | 字典 | 集合 | |
括号 | [] | () | {} | {} |
有序无序 | 有序 | 有序 | 有序 | 无序 |
可变不可变 | 可变 | 不可变 | 可变 | 可变 |
重复不重复 | 可重复 | 可重复 | 可重复 | 不可重复 |
存储与查找方式 | 索引 | 索引 | 键值 | 值 |
3.词频统计
-
1.下载一长篇小说,存成utf-8编码的文本文件 file
2.通过文件读取字符串 str
3.对文本进行预处理
4.分解提取单词 list
5.单词计数字典 set , dict
6.按词频排序 list.sort(key=lambda),turple
7.排除语法型词汇,代词、冠词、连词等无语义词
- 自定义停用词表
- 或用stops.txt
8.输出TOP(20)
9.可视化:词云
排序好的单词列表word保存成csv文件
import pandas as pd
pd.DataFrame(data=word).to_csv('big.csv',encoding='utf-8')
线上工具生成词云:
https://wordart.com/create
# # -*- coding: utf-8 -*- sep='.,:; ( ) " !' fo = open(r'C:UserszyDesktop ovel.txt','r',encoding='utf8') text = fo.read() text = text.lower() for ch in sep: text = text.replace(ch,' ') text = text.split() settext = set(text) exclude = {'a','the','and','i','you','in','but','not','with','by','its','for','of','an','to','his','he','was','her', 'that','had','him','has','it','that','their','which','my','so','she','be','as','they','all'} settext = settext-exclude textDict = {} for word in settext: textDict[word] = text.count(word) word = list(textDict.items()) word.sort(key = lambda x : x[1], reverse=True) import pandas as pd pd.DataFrame(data=word).to_csv('novel1.csv',encoding='utf-8')