一、set
s1 = set()
爬虫技术,访问一个url后放入set(),如果遇到重复的就不需要再爬了。
1 s1 = set() 2 s1.add('wohaoshuai') 3 s1.add('wohaoshuai') 4 print(s1) 5 {'wohaoshuai'}
1、访问速度快。
2、天生解决了重复问题。
3、
1 s2 = set([1,2,3,4]) 2 s3 = s2.difference([1,2]) 3 print(s2) 4 print(s3) 5 {1,2,3,4} 6 {3,4}
4、
1 a = set([1,2,3]) 2 b = set([3,4,5]) 3 c = a & b 4 d = a | b 5 print(c) 6 print(d) 7 {3} 8 {1, 2, 3, 4, 5}
5、
1 s1 = set([11,22,33]) 2 s2 = set([22,44,33,55]) 3 res1 = s1.difference(s2) 4 res2 = s1.symmetric_difference(s2) 5 print(res1) 6 print(res2)
{11}
{11, 44, 55}