lxml库之etree使用小结

lxml库之etree使用小结
原文:https://blog.csdn.net/caicaibird0531/article/details/90694849

一、etree的Element类

1.通过etree.Element()创建XML树
```
from lxml import etree

root = etree.Element("root")
print(root.tag)
# 添加子元素
root.append(etree.Element("child1"))
child2 = etree.SubElement(root,"child2")
child3 = etree.SubElement(root,"child3")
# 查看现在的XML元素
print(etree.tostring(root,  pretty_print=True))
```
2.etree.Element的属性
```
from lxml import etree

root = etree.Element("root",goodmorning='Guten Tag') #设置属性方法1
print(etree.tostring(root))
print(root.get('goodmorning')) #获取属性方法1

root.set("hello","caicaibird") #设置属性方法2
print(root.attrib['hello']) #获取属性方法2
print(etree.tostring(root))
```
3.etree.Element的text属性
```
root.text = "好好学习天天向上"
print(root.text)
print(etree.tostring(root))
```
二、从字符串和文件中解析

lxml.etree支持多种方式解析XML，主要用到的解析函数是fromstring()和parse()。

1.fromstring()函数

fromstring()函数是解析字符串最简单的方法。
```
some_xml_data = "<root>data</root>"

root = etree.fromstring(some_xml_data)
print(root.tag)
print(etree.tostring(root))
```
2.XML()函数

XML()函数类似于fromstring()函数，通常用于XML化文档。
```
root = etree.XML("<root>data</root>")
print(root.tag)
print(etree.tostring(root))
```
3.HTML()函数

HTML()函数类似于XML()函数，通常用于HTML化文档。
```
root = etree.HTML("<p>data</p>")
print(root.tag)
print(etree.tostring(root))
```
4.parse()函数

parse()函数用于解析文件或类文件对象。
```
from io import BytesIO
some_file_or_file_like_bject = BytesIO(b"<root>data</root>")
tree = etree.parse(some_file_or_file_like_bject)
print(etree.tostring(tree))
```
三、结合Xpath搜索字符串
```
xml = '''
<bookstore>
<book category="WEB">
  <title lang="c">Learning XML</title>
  <author>Erik T. Ray</author>
  <year>2003</year>
  <price>39.95</price>
</book>
</bookstore>
'''
html = etree.HTML(xml)
result = html.xpath('//book/price/text()')
```
四、Xpath参考
- xpath解析原理：
  1.实例化一个etree的对象，且需要将被解析的页面源码数据加载到该对象中。
  2.调用etree对象中的xpath方法结合着xpath表达式实现标签的定位和内容的捕获。
- 环境的安装：
  pip install lxml
- 如何实例化一个etree对象:from lxml import etree
  1.将本地的html文档中的源码数据加载到etree对象中：etree.parse(filePath)
  2.可以将从互联网上获取的源码数据加载到该对象中: etree.HTML('page_text')
- xpath('xpath表达式')
  - /:表示的是从根节点开始定位。表示的是一个层级。
  - //:表示的是多个层级。可以表示从任意位置开始定位。
  - 属性定位：//div[@class='song'] tag[@attrName="attrValue"]
  - 索引定位：//div[@class="song"]/p[3] 索引是从1开始的。
  - 取文本：
    /text() 获取的是标签中直系的文本内容
    //text() 标签中非直系的文本内容（所有的文本内容）
  - 取属性：
    /@attrName ==>img/src
  - 运算符:https://www.runoob.com/xpath/xpath-operators.html
    or 或运算: //div[@class='song'] or //div[@class='song']
    and 与运算: //div[@class='song'] and //div[@class='song']
初学linux，每学到一点东西就写一点，如有不对的地方，恳请包涵！
相关阅读:
LoadRunner的Capture Level说明
 LoadRunner Click&script 录制Tips
LoadRunner虚拟用户协议脚本语言矩阵表
 LoadRunner 测试 AJAX
如何创建自定义性能计数器
 8个批量样本数据生成工具
 JDBC性能优化
 使用Servlet为LoadRunner提供全局连续唯一数
 LoadRunner11测试Weblogic的问题
 如何让Fiddler捕获并记录HTTPS包？
原文地址：https://www.cnblogs.com/forlive/p/16373662.html

lxml库之etree使用小结

一、etree的Element类

1.通过etree.Element()创建XML树

2.etree.Element的属性

3.etree.Element的text属性

二、从字符串和文件中解析

1.fromstring()函数

2.XML()函数

3.HTML()函数

4.parse()函数

三、结合Xpath搜索字符串

四、Xpath参考