• 23-python用BeautifulSoup用抓取a标签内所有数据


    1.获取子标签:

    thr_msgs = soup.find_all('div',class_=re.compile('msg'))

     
    for in thr_msgs:
        print(i)
        first = i.select('em:nth-of-type(1)')
        print(first)
     
     
     
    >>>
     
    <div class='"msg"'><em>佛山</em><em>1-3年</em><em>大专</em></div>
    [<em>佛山</em>]
    <div class='"msg"'><em>南京</em><em>3-5年</em><em>本科</em></div>
    [<em>南京</em>]
    <div class='"msg"'><em>南阳</em><em>1-3年</em><em>大专</em></div>
    [<em>南阳</em>]
    <div class='"msg"'><em>深圳</em><em>1年以内</em><em>本科</em></div>
    [<em>深圳</em>]

    2.过去一个标签内内容:

    原文:https://blog.csdn.net/suibianshen2012/article/details/62040460?utm_source=copy  

    # -*- coding:utf-8 -*-
    #python 2.7
    #XiaoDeng
    #http://tieba.baidu.com/p/2460150866
    #标签操作
    
    
    from bs4 import BeautifulSoup
    import urllib.request
    import re
    
    
    #如果是网址,可以用这个办法来读取网页
    #html_doc = "http://tieba.baidu.com/p/2460150866"
    #req = urllib.request.Request(html_doc) 
    #webpage = urllib.request.urlopen(req) 
    #html = webpage.read()
    
     
    
    html="""
    <html><head><title>The Dormouse's story</title></head>
    <body>
    <p class="title" name="dromouse"><b>The Dormouse's story</b></p>
    <p class="story">Once upon a time there were three little sisters; and their names were
    <a href="http://example.com/elsie" class="sister" id="xiaodeng"><!-- Elsie --></a>,
    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    <a href="http://example.com/lacie" class="sister" id="xiaodeng">Lacie</a>
    and they lived at the bottom of a well.</p>
    <p class="story">...</p>
    """
    soup = BeautifulSoup(html, 'html.parser') #文档对象
    
    
    #查找a标签,只会查找出一个a标签
    #print(soup.a)#<a class="sister" href="http://example.com/elsie" id="xiaodeng"><!-- Elsie --></a>
    
    for k in soup.find_all('a'):
    print(k)
    print(k['class'])#查a标签的class属性
    print(k['id'])#查a标签的id值
    print(k['href'])#查a标签的href值
    print(k.string)#查a标签的string
    #tag.get('calss'),也可以达到这个效果
    

      37-python中bs4获取的标签中如何提取子标签

  • 相关阅读:
    Android的数据存储
    Servlet第一天
    JavaScript高级程序设计读书笔记(3)
    Interesting Papers on Face Recognition
    Researchers Study Ear Biometrics
    IIS 发生意外错误 0x8ffe2740
    Father of fractal geometry, Benoit Mandelbrot has passed away
    Computer vision scientist David Mumford wins National Medal of Science
    Pattern Recognition Review Papers
    盒模型bug的解决方法
  • 原文地址:https://www.cnblogs.com/zhumengdexiaobai/p/9781061.html
Copyright © 2020-2023  润新知