• 利用beautifulsoup进行对标签的二次查找-以打印网易云歌单内容为例


    以网易云歌单为例:https://music.163.com/#/playlist?id=3212113629

    坑在于要提交的网站,从下图可以看到要提交的网站是https://music.163.com/playlist?id=3212113629,而非直接复制的https://music.163.com/#/playlist?id=3212113629

     开始工作

    首先获取页面html

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://music.163.com/playlist?id=3212113629"    # 注意直接复制的地址有个#,这里要去掉
    demo = requests.get(url).text
    soup = BeautifulSoup(demo, "html.parser")

    通过打印soup可以看到要下载的内容在ul标签

    for ss in soup.find_all('ul'):
        print(ss)

    运行会发现,有许多ul标签

     可以进行二次查找

    for ss0 in soup.find_all('ul',{"class":"f-hide"}):
        print(ss0)
    或者
    for ss0 in soup.find_all('ul',class:="f-hide"):

     

     完整代码为:

    # write by xdd1997  xdd2026@qq.com
    # 2020-08-07
    
    import requests
    from bs4 import BeautifulSoup
    
    url = "https://music.163.com/playlist?id=3212113629"
    demo = requests.get(url).text
    soup = BeautifulSoup(demo, "html.parser")
    
    for ss0 in soup.find_all('ul',{"class":"f-hide"}):
        for ii in ss0.find_all('a'):
            print(ii.string)


    更新,添加头部信息 2020-08-20

    第二天运行了几次后,发现不行了,应该是网易云有来源审查反爬机制,故添加了头部信息

    #encoding = utf8
    # write by xdd1997  xdd2026@qq.com
    # 2020-08-20
    
    import requests
    from bs4 import BeautifulSoup
    url
    = "https://music.163.com/playlist?id=5138652624" # 注意直接复制的网址要去掉#号 try: kv = {'user-agent':'Mozilla/5.0'} #应对爬虫审查 r = requests.get(url,headers=kv) r.raise_for_status() #若返回值不是202,则抛出一个异常 r.encoding = r.apparent_encoding except: print("进入网站失败") demo = r.text soup = BeautifulSoup(demo, "html.parser") #print(soup) index = 0 for ss in soup.find_all('ul',{"class":"f-hide"}): # 查找<ul class="f-hide"> ...</ul> for ii in ss.find_all('a'): # print(ii.string) index = index + 1 print( str(index) + ' '+ '点歌 ' + ii.string) ''' for i in soup.ul.descendants: print(i.string) print('------------------------------------------') for i in soup.ul.children: print(i.string) '''

     


    转载仅为学习,不会商用。
    欢迎转载原创,附文链接。
  • 相关阅读:
    javascript ext 闭包
    Hibernate HQL from superclass 问题
    sql查询按in顺序排序显示数据 oracle
    Hibernate createSQLquery()
    sql 分页
    javasript 闭包测试
    Excel 批量快速导入mySQL 解决方案~~
    C# 注册COM+组件步骤~
    QT错误集锦~
    QuartzNet Test~~
  • 原文地址:https://www.cnblogs.com/xdd1997/p/13530770.html
Copyright © 2020-2023  润新知