爬大主宰小说（第一代）

功能有待加强，可实现抓取，不过速度太慢，由于代理，没设置header，幸好网站对爬虫没设限制

 1 import requests
 2 from bs4 import BeautifulSoup
 3 def get_url_list(url):
 4     content = requests.get(url).content
 5     soup = BeautifulSoup(content,'lxml')
 6     list = []
 7     for i in soup.select('#list dl dd a'):
 8                          temp = 'http://www.biquge.info/0_921/'+i.get('href')
 9                          list.append(temp)
10     return list
11 def get_date(url):
12     content = requests.get(url).content
13     soup = BeautifulSoup(content,'lxml')
14     soup1 = str(soup.select('#content'))
15     text = soup1.replace('<br/>','
').replace('</div>','
').replace('<div id="content">','')
16     title = soup.select('.content_read .box_con .bookname h1')[0].get_text()
17     f = open(r'F:\栋歌第一代爬虫.txt','a+',encoding = 'utf-8')
18     f.write(title +"

"+text)
19     print(title)
20     f.close()
21                          
22 if __name__=="__main__":
23     url = 'http://www.biquge.info/0_921/'
24     url_list = get_url_list(url)
25     for i in url_list:
26         get_date(i)

相关阅读:
pip 安装用国内清华大学的数据源
sql server 中导出图片并命名
金跌 K3 科目表
SQLSERVER 循环
语句判断记录是否存在（最简洁简单性能最优）
增、删、改开放时间表时，同步数据至 CP
[网络流24题] 圆桌问题（最大流）
[网络流24题] 最小路径覆盖问题（匈牙利最大流）
C. Vasya And Array (贪心构造)
【牛客网】一个有意思的前缀和题目

原文地址：https://www.cnblogs.com/kangdong/p/8627354.html