As mentioned in the introduction Sun now provides these tools for XML Processing in Java:
In the following sections I will talk a bit about what these tools are, and what their purposes are.
StAX
The Java StAX API is a streaming API for reading and writing XML streams. As such it resembles the older SAX API, except that the SAX API can only be used to read XML, not write it.
The main difference between the SAX and StAX API is that when using SAX you provide a handler to the SAX parser. This handler then has methods called on it, corresponding to the entitites (elements, comments, text etc.) found in the XML document. It is the SAX parser that controls the iteration through the XML document. With the StAX parser you control the iteration through the XML document. You call the next()
method (or corresponding method) when you feel you are ready to process the next element, text node etc. From the next()
method you obtain an object that can tell you what entity you have met in the XML document.
Controlling the iteration through the XML document like this can be an advantage. The iteration can be kept within the scope of one method (perhaps calling submethods). This means that you can use / access the same local variables when processing a piece of text, as when processing an element. With SAX, these two entities are handled by two different methods in your handler object. Thus, if you need to access shared variables from these different methods, these variables has to be member variables in the handler object. Not a huge difference, but local variables may still be preferable in many situations.
The StAX API interfaces comes with Java 6, but there is yet no implementation. A standard implementation can be found at stax.codehaus.org.
SAX Parser
The SAX parser is the first API for processing XML entity by entity (element, text, comments etc.) as they are met during traversal of the XML document. When you use a SAX parser you pass a handler object to the SAX parser. This handler object has a method for every "event" you want to handle, during the traversal of the XML document. Examples of events are startDocument
, startElement
, characters
etc.
The SAX parser is mostly suited for processing XML documents where each element can be processed individually. Documents where you need access to earlier or later parts of the document, to process a given element, are not as easy to handle. Well, if you need access to earlier parts, you can store that earlier part when it occurs in a member variable in the handler object. But if you need access to later elements, this is kind of difficult. Unless you process the earlier elemements when the later elements needed are encountered. Yet, if you need to jump around the XML document when processing it, it might be easier to use a DOM parser.
In most cases where you find a SAX parser useful you may be better off using a StAX reader instead.
The SAX interfaces and implementation comes with Java (at least from Java 5).
DOM Parser
The DOM (Document Object Model) parser parses an XML document into an object graph. The whole document is converted into one big objects. Once created you can traverse the object graph at will. You can walk up and down in the graph as you please. This object graph takes up a lot of memory, so this should only be used in situations where no other options are suitable.
The DOM interfaces and implementation comes with Java (at least from Java 5).
XPath Evaluator
Java comes with a built-in XPath evaluator. You set up an XPath expression, and have the evaluator evalute the expression on a DOM object. The evaluator then returns the elements matching the XPath expression. XPaths can be a handy way of finding the nodes you need to process, rather than navigate down to them yourself.
XSL Processor
Java comes with a built-in XSL Processor. An XSL Processor transforms an input XML document to an output document (not necessarily XML), according to an XSL stylesheet. An example appliance of a stylesheet would be to transform an XML document containing data, to an HTML document with that data presented in a more humanly readable format, for instance in tables, lists etc.
JAXB
JAXB is an API that resembles the DOM API. JAXB lets you generate classes from an XSL Schema, matching the XML document defined in that schema. JAXB then lets you read an XML document conforming to that schema, into the an object structure built from the generated objects. You can also serialize that object structure back to disk or network.
The generated JAXB classes looks more like regular domain objects. They have getters and setters with names matching the element names. The DOM API just has methods like addElement()
etc. where the concrete element name is a parameter, and you need to know what elements can be added as children at any given element in the DOM structure. The JAXB generated clasess thus help you more, by reflecting the allowed structure in class and method names.