Python入门记

#!/usr/bin/env python3 指定执行该脚本的解释器

// 称为地板除，两个整数的除法仍然是整数

# -*- coding: utf-8 -*- 按照UTF-8编码读取源代码

函数参数

　　*args是可变参数，args接收的是一个tuple；

　　**kw是关键字参数，kw接收的是一个dict。

functools.partial(int, base=2) 偏函数

　　>>>int2 = functools.partial(int, base=2)

　　>>>int2('10010')

　　相当于

　　>>>kw = { 'base': 2 }

　　>>>int('10010', **kw)

 1 #by http://blog.csdn.net/qq_32190139/article/details/78158158
 2 #湖南农大OJ http://210.43.224.19/oj/
 3 
 4 import requests
 5 import urllib
 6 import re
 7 from bs4 import BeautifulSoup
 8 
 9 loginurl = 'http://210.43.224.19/oj/login.php'  # 实际提交表单页面
10 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}  #UA 使得服务器能够识别客户使用的操作系统及版本、CPU 类型、浏览器及版本、浏览器渲染引擎、浏览器语言、浏览器插件等
11 
12 
13 def getcsrf():
14     firsturl = 'http://210.43.224.19/oj/csrf.php'  # 获取csrf请求页面
15     page = urllib.request.urlopen(firsturl)   #  发起get请求
16     html = page.read()
17     html = html.decode('utf-8')  # 记得转码
18     t = re.compile(r'name="csrf" value="(.+)" ')  # 使用正则表达式进行匹配
19     list1 = t.findall(html) #找到html中的value
20     return list1[0]
21 
22 
23 def login(datas,k):
24     geturl = 'http://210.43.224.19/oj/problemset.php'+'?page='+str(k)  # 需要爬取的登录之后的页面
25     s = requests.session()  # 构建session对象  会话对象requests.Session能够跨请求地保持某些参数，比如cookies
26     Post = s.post(loginurl, data=datas, headers=headers)  # post数据
27     Get = s.get(geturl, cookies=Post.cookies, headers=headers)#其实这个页面不需要cookies也能进入，所以这里只是模拟登录
28     return Get.content  # 返回html
29 
30 
31 def download(html):
32     Soup = BeautifulSoup(html, "html.parser")
33     list = Soup.findAll('tr',attrs={'class':{'evenrow','oddrow'},'align':''})
34     href = re.compile(r'href="(.+)">')
35     url = 'http://210.43.224.19/oj/'
36     for i in list:
37         f.write("{:<60}".format(i.text) + url + i.find('a').attrs['href']+'
')
38 
39 f = open("out1.txt", "w")
40 k = 1
41 csrf = getcsrf()
42 paprm = {'user_id': 'cz476325514', 'password': 'zitong', 'csrf': csrf}  # 从登录发送的网络请求找到
43 while k <= 13:
44     html = login(paprm,k)
45     download(html)
46     k = k+1
47 f.close()
48 #下面是每一条题目标签的构造
49 #<tr class="evenrow"><td> <div class="none"> </div></td><td> <div class="center">1000</div></td><td> <div class="left"><a href="problem.php?id=1000">我会算加法</a></div></td><td> <div class="center"><nobr>贺细平</nobr></div></td><td> <div class="center"><a href="status.php?problem_id=1000&amp;jresult=4">4656</a></div></td><td> <div class="center"><a href="status.php?problem_id=1000">8576</a></div></td></tr>

模拟登录爬OJ题目

nice code:

str.capitalize() 返回str的首字母大写，其他全小写

str.title() 将str的所有单词首字母大写，其他全小写

str.count(sub, start= 0,end=len(string)) 字符串匹配，返回str中sub的个数

str.endswith(suffix[, start[, end]]) suffix是匹配的字符串，如以该字符串结尾，返回T

calendar.month(2016, 1) 返回该月日历（import calendar

datetime.date.isocalendar() 返回格式如(year，month，day)的元组,(2017, 15, 6) 其中date=date(2018,6,1) https://blog.csdn.net/alvin930403/article/details/54089087

join() list = [1, 2, 3, 4, 5] ','.join(list) ==1,2,3,4,5

math.factorial(num) 返回num的阶乘

reversed(seq) seq -- 要转换的序列，可以是 tuple, string, list 或 range。返回一个反转的迭代器

enumerate() 将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列，同时列出数据和数据下标---[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]

isupper() 方法检测字符串中所有的字母是否都为大写。

strip([chars]) 移除字符串头尾指定的字符（默认为空格）

str.encode(encoding='UTF-8',errors='strict') errors：设置不同错误的处理方案默认为 'strict'

pickle.dumps() 把任意对象序列化成一个bytes .load()反序列化

json.dumps(d) 返回一个str，内容是标准的JSON .load()反序列化

time.strftime(format[, t])以时间元组，并返回以可读字符串表示的当地时间，格式由参数format决定。t是一个struct_time对象。

相关阅读:
java时间戳与Date相互转换、日期格式化、给日期加上指定时长、判断两时间点是否为同一天
 notepad++去掉红色波浪线
 发生异常Address already in use: bind
SecureCRT背景颜色
 linux查看实时日志命令
 idel上传代码到github时遇到的Push rejected: Push to origin/master was rejected
git解决error: The following untracked working tree files would be overwritten by checkout
使用SecureCRT工具上传、下载文件的两种方法
 Windows下Zookeeper启动zkServer.cmd闪退问题的解决方案
 Maven的Snapshot版本与Release版本
原文地址：https://www.cnblogs.com/tony-/p/8397368.html