• xpath提取目录下所有标签内的内容,递归 //text()


    利用xpath来提取所有标签里面的内容,即使标签头不同

     1 #-*-coding:utf8-*-
     2 import re
     3 import os
     4 from lxml import etree
     5 html = '''
     6 <!DOCTYPE html>
     7 <html>
     8 <head lang="en">
     9     <meta charset="UTF-8">
    10     <title>测试-常规用法</title>
    11 </head>
    12 <body>
    13 <div id="content">
    14     <ul id="useful">
    15     <li>我</li>
    16     <ml>是</ml>
    17     <li>谁</li>
    18     </ul>
    19     <ul id="useless">
    20     <li>who </li>
    21     <li>am </li>
    22     <li>i!</li>
    23     </ul>
    24 </div>
    25 <div id="content">
    26     <ul id="useful"><li>你</li><ml>是</ml><li>谁!</li>
    27     </ul>
    28     <ul id="useless"><li>who </li><li>you </li><li>are!</li>
    29     </ul>
    30 </div>
    31 
    32 </body>
    33 </html>
    34 '''
    35 selector = etree.HTML(html)
    36 for k in range(1,3):
    37     chinese = selector.xpath('//div[@id="content"][%s]/ul[@id="useful"]//text()'%k)
    38     data = "".join([each for each in chinese])
    39     english = selector.xpath('//div[@id="content"][%s]/ul[@id="useless"]//text()'%k)
    40     Data = "".join([each for each in english])
    41     print data
    42     print Data

    结果:

  • 相关阅读:
    软件课设Day18
    软件课设Day17
    软件课设Day16
    2019/09/12最新进展
    2019/09/11最新进展
    2019/09/10最新进展
    2019/09/09最新进展
    2019/09/08最新进展
    2019/09/07最新进展
    2019/09/06最新进展
  • 原文地址:https://www.cnblogs.com/lovychen/p/5671287.html
Copyright © 2020-2023  润新知