CVPR论文的爬取

一、说在前面

　　python是一个熟悉而又陌生的语法，之前对疫情数据的爬取只是学到了一些皮毛，这次对CVPR论文的爬取是更近一步的加强，由于上一次的代码不完全是独立完成，所以这次的爬取遇到了些许困难，走一步遇到一部分困难，解决一个问题出现一个新问题，不过过程还是很有成就感的。

二、源程序代码

 1 '''
 2 Created on 2020年4月14日
 3 
 4 @author: 26218
 5 '''
 6 from urllib.request import urlopen
 7 from bs4 import BeautifulSoup
 8 import pymysql
 9  
10 #html = urlopen('https://blog.csdn.net/zzc15806/') #获取网页
11 html = urlopen('http://openaccess.thecvf.com/CVPR2018.py') #获取网页
12 bs = BeautifulSoup(html, 'html.parser') #解析网页
13 hyperlink = bs.find_all('a')  #获取所有超链接
14 name_list=[]
15 where_list=[]
16 #获取链接
17 for h in hyperlink:
18     hh = h.get('href')
19     if(isinstance(hh,str) and hh.endswith(".pdf")):
20         where_list.append('http://openaccess.thecvf.com/'+hh)
21         
22         
23 dt_list=bs.find_all('dt')
24 for a in dt_list:
25     aa=a.find('a').get_text()
26     name_list.append(aa)
27 #print(name_list)
28 # print('******************************************************************************')
29 # print(where_list)
30 db=pymysql.connect("localhost","root","123456","cvpr", charset='utf8')
31 cursor = db.cursor()
32 for index in range(0,len(name_list)):
33     sql="insert into cvpr (name,href) values (%s,%s)"
34     cursor.execute(sql,[name_list[index],where_list[index]])
35     db.commit()
36 db.close()
37 print('end')

三、个人体会

　　团队合作确实比一个人的工作能力要强，并且效率要高，但是如果个人的任务不及时的完成的话就会拖累团队。

相关阅读:
powershell和cmd区别
装饰器笔记
url参数和字典的相互转化
Python装饰器详解
python字符串格式化f-string
Python函数（function）与方法（method）区别
jenkins钉钉插件报错keywords not in content
jenkins配置邮件
vim常用操作
Vue之axios请求

原文地址：https://www.cnblogs.com/suanai/p/12725861.html