Java xml 解析

1.

XML框架结构

Java SE 6 平台提供的 XML 处理主要包括两个功能：XML 处理（JAXP，Java Architecture XML Processing）和 XML 绑定（JAXB，Java Architecture XML Binding）。

JAXP 包括 SAX 框架 —— 遍历元素，做出处理；DOM 框架 —— 构造 XML 文件的树形表示；StAX 框架 —— 拖拽方式的解析；XSLT 框架 —— 将 XML 数据转换成其他格式。JAXB 则是负责将 XML 文件和 Java 对象绑定，在新版 JDK 中，被大量的使用在 Web 服务技术中。

目前的Java 操作xml的类库主要有：http://www.open-open.com/31.htm

2. stax解析

Ierator Event Types

Table 3-2 lists the thirteen XMLEvent types defined in the event iterator API.

Table 3-2 XMLEvent Types


Event Type	Description
`StartDocument`	Reports the beginning of a set of XML events, including encoding, XML version, and standalone properties.
`StartElement`	Reports the start of an element, including any attributes and namespace declarations; also provides access to the prefix, namespace URI, and local name of the start tag.
`EndElement`	Reports the end tag of an element. Namespaces that have gone out of scope can be recalled here if they have been explicitly set on their corresponding `StartElement`.
`Characters`	Corresponds to XML `CData` sections and `CharacterData` entities. Note that ignorable whitespace and significant whitespace are also reported as `Character` events.
`EntityReference`	Character entities can be reported as discrete events, which an application developer can then choose to resolve or pass through unresolved. By default, entities are resolved. Alternatively, if you do not want to report the entity as an event, replacement text can be substituted and reported as `Characters`.
`ProcessingInstruction`	Reports the target and data for an underlying processing instruction.
Comment	Returns the text of a comment
EndDocument	Reports the end of a set of XML events.
DTD	Reports as `java.lang.String` information about the DTD, if any, associated with the stream, and provides a method for returning custom objects found in the DTD.
Attribute	Attributes are generally reported as part of a `StartElement` event. However, there are times when it is desirable to return an attribute as a standalone `Attribute` event; for example, when a namespace is returned as the result of an `XQuery` or `XPath` expression.
Namespace	As with attributes, namespaces are usually reported as part of a `StartElement`, but there are times when it is desirable to report a namespace as a discrete `Namespace` event.

Sample Event Mapping

As an example of how the event iterator API maps an XML stream, consider the following XML document:

<?xml version="1.0"?>
<BookCatalogue xmlns="http://www.publishing.org">
  <Book>
    <Title>Yogasana Vijnana: the Science of Yoga</Title>
    <ISBN>81-40-34319-4</ISBN>
    <Cost currency="INR">11.50</Cost>
  </Book>
</BookCatalogue>

This document would be parsed into eighteen primary and secondary events, as shown below. Note that secondary events, shown in curly braces ({}), are typically accessed from a primary event rather than directly.

Table 3-3 Sample Iterator API Event Mapping
#	Element/Attribute	Event
1	version="1.0"	StartDocument
2	isCData = false data = " " IsWhiteSpace = true	Characters
3	qname = BookCatalogue:http://www.publishing.org attributes = null namespaces = {BookCatalogue" -> http://www.publishing.org"}	StartElement
4	qname = Book attributes = null namespaces = null	StartElement
5	qname = Title attributes = null namespaces = null	StartElement
6	isCData = false data = "Yogasana Vijnana: the Science of Yoga " IsWhiteSpace = false	Characters
7	qname = Title namespaces = null	EndElement
8	qname = ISBN attributes = null namespaces = null	StartElement
9	isCData = false data = "81-40-34319-4 " IsWhiteSpace = false	Characters
10	qname = ISBN namespaces = null	EndElement
11	qname = Cost attributes = {"currency" -> INR} namespaces = null	StartElement
12	isCData = false data = "11.50 " IsWhiteSpace = false	Characters
13	qname = Cost namespaces = null	EndElement
14	isCData = false data = " " IsWhiteSpace = true	Characters
15	qname = Book namespaces = null	EndElement
16	isCData = false data = " " IsWhiteSpace = true	Characters
17	qname = BookCatalogue:http://www.publishing.org namespaces = {BookCatalogue" -> http://www.publishing.org"}	EndElement
18		EndDocument

相关阅读:
python关于字典如何格式化地写入文件之中
关于python如何安装和配置chromedriver以及一些相关问题
python编码的原理以及写入文件中乱码的原因
json到底是什么？？？？？？
scrapy框架Request函数callback参数为什么是self.parse而不是self.parse( )
scrapy框架xpath的几点说明
python基于scrapy框架的反爬虫机制破解之User-Agent伪装
HTML,CSS,JavaScript,json,xml之间的关系
scrapy框架在未登录模式下爬取文本，文件和图片的几点收获
scrapy爬虫提取网页链接的两种方法以及构造HtmlResponse对象的方式

原文地址：https://www.cnblogs.com/ranger2016/p/3872796.html