• 【python】lxml


    来源:http://lxml.de/tutorial.html

    lxml是python中处理xml的一个非常强大的库,可以非常方便的解析和生成xml文件。下面的内容翻译了链接中的一部分

    1.生成空xml节点

    from lxml import etree
    
    root = etree.Element("root")
    print(etree.tostring(root, pretty_print=True))
    <root/>

    2.生成xml子节点

    from lxml import etree
    
    root = etree.Element("root")
    root.append(etree.Element("child1"))     #方法一
    child2 = etree.SubElement(root, "child2")  #方法二
    child2 = etree.SubElement(root, "child3")
    print(etree.tostring(root))
    <root>
      <child1/>
      <child2/>
      <child3/>
    </root>

    3.生成带内容的xml节点

    from lxml import etree
    
    root = etree.Element("root")
    root.text = "Hello World"
    print(etree.tostring(root, pretty_print=True))
    <root>Hello World</root>

    4.属性

    lxml中将属性以字典的形式存储

    生成属性

    from lxml import etree
    
    root = etree.Element("root", intersting = "totally")  #方法一
    root.set("hello","huhu")  #方法二
    root.text = "Hello World"
    print(etree.tostring(root))
    <root intersting="totally" hello="huhu">Hello World</root>

    获取属性

    方法一:

    root.get("interesting")
    root.get("hello")
    totally
    huhu

    方法二:

    attributes = root.attrib
    print(attributes["interesting"])

    遍历属性

    for name, value in sorted(root.items()):
         print('%s = %r' % (name, value))

    5.生成特殊内容

    如下xml,中间的文字被<br/>分割,需要用到.tail

    <html><body>Hello<br/>World</body></html>
    html = etree.Element("html")
    body = etree.SubElement(html, "body")
    body.text = "TEXT"
    br = etree.SubElement(body, "br")
    br.tail = "TAIL"
    etree.tostring(html)

    6.遍历

    遍历节点

    for element in root.iter():
         print("%s - %s" % (element.tag, element.text))

    遍历指定子节点,将子节点名写入iter()

    for element in root.iter("child"):
         print("%s - %s" % (element.tag, element.text))

    7.用XPath查找节点内容

    build_text_list = etree.XPath("//text()") # lxml.etree only!
    print(build_text_list(html))

    8.查找节点

    iterfind():遍历所有节点匹配表达式

    findall():返回满足匹配的节点列表

    find():返回满足匹配的第一个

    findtext():返回第一个满足匹配条件的.text内容

    设有以下xml内容

    root = etree.XML("<root><a x='123'>aText<b/><c/><b/></a></root>")

    查找子节点

    >>> print(root.find("b"))
    None
    >>> print(root.find("a").tag)
    a

    查找树中任意节点

    >>> print(root.find(".//b").tag)
    b
    >>> [ b.tag for b in root.iterfind(".//b") ]
    ['b', 'b']

    查找具有指定属性的节点

    >>> print(root.findall(".//a[@x]")[0].tag)
    a
    >>> print(root.findall(".//a[@y]"))
    []

    9.字符串解析为XML

    >>> some_xml_data = "<root>data</root>"
    
    >>> root = etree.fromstring(some_xml_data)
    >>> print(root.tag)
    root
    >>> etree.tostring(root)
    b'<root>data</root>'

    10.使用E-factory快速生成XML和HTML

    >>> from lxml.builder import E
    
    >>> def CLASS(*args): # class is a reserved word in Python
            return {"class":' '.join(args)}
    
    >>> html = page = (
        E.html(       # create an Element called "html"
          E.head(
            E.title("This is a sample document")
          ),
          E.body(
            E.h1("Hello!", CLASS("title")),
            E.p("This is a paragraph with ", E.b("bold"), " text in it!"),
            E.p("This is another paragraph, with a", "
          ",
              E.a("link", href="http://www.python.org"), "."),
            E.p("Here are some reservered characters: <spam&egg>."),
            etree.XML("<p>And finally an embedded XHTML fragment.</p>"),
          )
        )
      )
    
    >>> print(etree.tostring(page, pretty_print=True))
    <html>
      <head>
        <title>This is a sample document</title>
      </head>
      <body>
        <h1 class="title">Hello!</h1>
        <p>This is a paragraph with <b>bold</b> text in it!</p>
        <p>This is another paragraph, with a
          <a href="http://www.python.org">link</a>.</p>
        <p>Here are some reservered characters: &lt;spam&amp;egg&gt;.</p>
        <p>And finally an embedded XHTML fragment.</p>
      </body>
    </html>
  • 相关阅读:
    linux中的等待队列
    MapReduce中的作业调度
    hdfs: 数据流(二)
    hdfs: 一个分布式文件系统(一)
    记住这一天
    Partitioning, Shuffle and sort
    从wordcount 开始 mapreduce (C++hadoop streaming模式)
    iOS9 请求出现App Transport Security has blocked a cleartext HTTP (http://)
    Xcode7 下iphone6、6s进行屏幕适配
    隐藏系统的uitabbar
  • 原文地址:https://www.cnblogs.com/dplearning/p/5762070.html
Copyright © 2020-2023  润新知