• bs4实战之三国演义数据爬取



    # 需求:爬取三国演义小说中的章节标题与章节内容http://www.shicimingju.com/book/sanguoyanyi.html
    import requests
    from bs4 import BeautifulSoup
    if __name__ == "__main__":
    # 对首页数据进行爬取
    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
    } # UA伪装
    url = 'http://www.shicimingju.com/book/sanguoyanyi.html'
    page_text = requests.get(url=url,headers=headers).text

    # 在首页解析出章节的标题和详情页的url
    # 1实例化beautifulsoup对象,需要将页面源码数据加载到该对象中
    soup = BeautifulSoup(page_text,'lxml')
    # 在首页解析出章节的标题和详情页的url
    li_list=soup.select('.book-mulu > ul > li ')

    fp = open("./sanguo.txt",'w',encoding='utf-8')
    for li in li_list:
    title = li.a.string #todo
    detail_url = 'http://www.shicimingju.com'+li.a['href']
    # 对详情页发起请求,解析出章节内容
    detail_page_text = requests.get(url=detail_url,headers = headers).text
    # 解析出详情页中的相关内容
    detail_soup = BeautifulSoup(detail_page_text,'lxml')
    div_tag = detail_soup.find('div',class_= 'chapter_content')
    # 解析到了章节内容
    content = div_tag.text()
    fp.write(title +':'+ content+' ')
    print(title,"爬取成功")


  • 相关阅读:
    CocoaPods 安装教程
    iOS 如何使用第三方字库
    iOS 获取当前网络状态
    GitHub上README.md教程
    如何获取iOS软件包内容
    iOS中如何监测来电
    OS 如何选择delegate、notification、KVO?
    iOS-Block总结 && 全面解析逆向传值
    iOS面试题
    iOS优秀博客收录
  • 原文地址:https://www.cnblogs.com/huahuawang/p/12692354.html
Copyright © 2020-2023  润新知