• pyQuery库


    pyQuery 也是做筛选的一个库

    一般引用 from pyquery import pyQuery as pq  常规用法

    1、字符串初始化

    html = '''
    <div>
        <ul>
             <li class="item-0">first item</li>
             <li class="item-1"><a href="link2.html">second item</a></li>
             <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
             <li class="item-1 active"><a href="link4.html">fourth item</a></li>
             <li class="item-0"><a href="link5.html">fifth item</a></li>
         </ul>
     </div>
    '''
    from pyquery import PyQuery as pq
    doc = pq(html)
    print(doc('li'))

    返回一个 py对象 doc

    结果 返回一个 doc中的所有 li标签

    <li class="item-0">first item</li>
             <li class="item-1"><a href="link2.html">second item</a></li>
             <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
             <li class="item-1 active"><a href="link4.html">fourth item</a></li>
             <li class="item-0"><a href="link5.html">fifth item</a></li>
    字符串初始化结果

    2、url 网址初始化

    from pyquery import PyQuery as pq
    doc = pq(url='http://www.baidu.com')
    print(doc('head'))

    3、文件初始化

    from pyquery import PyQuery as pq
    doc = pq(filename='demo.html')
    print(doc('li'))

    4、CSS选择器

    例如  .list(空格).item-0.active    表示 list类中的(嵌套)同时包含 item-0和 active类的标签

    li.siblings()  表示 返回 li标签的所有 同等级的标签(除了 li本身)

    5、查找元素

    可以用 find查询,例如 item =doc('.list')  lis =item.find('li')  来查询 list类下的 li标签

    父元素    container = items.parent()  查询得到上一级标签

    6、遍历元素

    html = '''
    <div class="wrap">
        <div id="container">
            <ul class="list">
                 <li class="item-0">first item</li>
                 <li class="item-1"><a href="link2.html">second item</a></li>
                 <li class="item-2 active"><a href="link3.html"><span class="bold">third item</span></a></li>
                 <li class="item-3 active"><a href="link4.html">fourth item</a></li>
                 <li class="item-4"><a href="link5.html">fifth item</a></li>
             </ul>
         </div>
     </div>
    '''
    from pyquery import PyQuery as pq
    doc = pq(html)
    li = doc('.item-2.active')
    print(li)
    from pyquery import PyQuery as pq
    doc = pq(html)
    lis = doc('li').items()
    print(type(lis))
    for li in lis:
        print(li)

    items()  生成器  python基础有写

    遍历出 lis的元素

    <class 'generator'>
    <li class="item-0">first item</li>
                 
    <li class="item-1"><a href="link2.html">second item</a></li>
                 
    <li class="item-2 active"><a href="link3.html"><span class="bold">third item</span></a></li>
                 
    <li class="item-3 active"><a href="link4.html">fourth item</a></li>
                 
    <li class="item-4"><a href="link5.html">fifth item</a></li>
    遍历元素的结果

    7、获取信息

    a.sttr('helf)  或者 a.attr.helf

    from pyquery import PyQuery as pq
    doc = pq(html)
    a = doc('.item-2.active a')
    print(a)
    print(a.attr('href'))
    print(a.attr.href)

    结果如下

    <a href="link3.html"><span class="bold">third item</span></a>
    link3.html
    link3.html

    取到 a标签  然后在 a标签中查询 href属性,如果没用那么显示 None ,不会去别的地方找

    8、获取内容

    from pyquery import PyQuery as pq
    doc = pq(html)
    a = doc('.item-2.active a')
    print(a)
    print(a.text())

     结果如上 会显示 a标签包括子标签内的所有内容,例如 

    <li class="item-2 active"><a href="link3.html">hello world<span class="bold">third item</span></a></li>

    则会显示  hello world 和 thied item  2个

    9、DOM操作

    removeClass  addClass

    from pyquery import PyQuery as pq
    doc = pq(html)
    li = doc('.item-2.active')
    print(li)
    li.removeClass('active')
    print(li)
    li.addClass('active')
    print(li)

    删除 对象li中 li的标签 中 class的 active属性

    再添加 class active属性

    attr   css

    from pyquery import PyQuery as pq
    doc = pq(html)
    li = doc('.item-2.active')
    print(li)
    li.attr('name', 'link')
    print(li)
    li.css('font-size', '14px')
    print(li)

    可以添加 name属性和 font-size属性

    remove

    html = '''
    <div class="wrap">
        Hello, World
        <p>This is a paragraph.</p>
     </div>
    '''
    from pyquery import PyQuery as pq
    doc = pq(html)
    wrap = doc('.wrap')
    print(wrap.text())
    wrap.find('p').remove()
    print(wrap.text())
    
    
    #Hello, World This is a paragraph.
    #Hello, World

    只取到div标签下的内容,除掉子标签的内容

    10、伪类选择器

    from pyquery import PyQuery as pq
    doc = pq(html)
    li = doc('li:first-child')
    print(li)                       # <li class="item-0">first item</li>
    li = doc('li:last-child')
    print(li)                       # <li class="item-4"><a href="link5.html">fifth item</a></li>
    li = doc('li:nth-child(2)')
    print(li)                       # <li class="item-1"><a href="link2.html">second item</a></li>
    li = doc('li:gt(2)')
    print(li)                       # <li class="item-3 active"><a href="link4.html">fourth item</a></li>
                                    # <li class="item-4"><a href="link5.html">fifth item</a></li>
    li = doc('li:nth-child(2n)')
    print(li)                       # <li class="item-1"><a href="link2.html">second item</a></li>
                                    # <li class="item-3 active"><a href="link4.html">fourth item</a></li>
    li = doc('li:contains(second)')
    print(li)                       # <li class="item-1"><a href="link2.html">second item</a></li>

    其中 child是从 0开始

    contains(xxx)  为包含 xxx内容的标签

  • 相关阅读:
    Sublime Text 3下载安装以及安装HTML-CSS-JS Prettify代码格式化插件
    eclipse升级Android SDK Tool版本到25.2.5后运行项目报错Unable to build: the file dx.jar was not loaded from the SDK folder
    MySQL字符集与排序规则总结
    windows版mysql安装
    maven 引入qrcode.jar
    Java Web 项目目录规范
    JAVA WEB项目目录结构以及web应用部署的根目录,编译路径和项目根目录的区别
    java实现二维码的生成和解读
    <![CDATA[]]>和转义字符
    jvm参数调优
  • 原文地址:https://www.cnblogs.com/yxlll/p/13589832.html
Copyright © 2020-2023  润新知