Introduction
XML is an increasingly popular way to encode documents, data, and electronic messages. Over the years Microsoft has offered a variety of libraries to facilitate creating, modifying, querying, and searching XML documents. LINQ to XML is a relatively new set of XML-related classes in the .NET Framework (found in the System.Xml.Linq
namespace), which enable developers to work with XML documents using LINQ's features, syntax, and semantics. As discussed in an earlier article, Introducing LINQ to XML, LINQ to XML is a simpler and easier to use API than previous libraries. Because LINQ to XML can utilize LINQ's query syntax and assortment of standard query operators, LINQ to XML code is usually very terse and readable.
This article continues our look at LINQ to XML. Specifically, we explore how to query XML documents using axis methods as well as how to search and filter XML documents using both LINQ's Where
method and XPath expressions. Read on to learn more!
Retrieving Child Elements
As discussed in Introducing LINQ to XML, the most frequently used class in the LINQ to XML API is the XElement
class, which represents an XML element. It's Load
method loads an XML document form disk or over the Internet and returns the root of the just-loaded document. The most frequently used class in the LINQ to XML API is the XElement
class, which represents an XML element. This class is used when programmatically constructing an XML document, when loading an XML document, and when searching, filtering, or otherwise enumerating the elements within an XML document. The Value
property returns concatenated text contents of the element and the text content of its descendants.
When working with an XML document we are often interested in a particular element or attribute value or a particular subset of elements and attribute values. The XElement
object has a number of helpful methods that we can use to retrieve such data. Let's start by looking at two of the most commonly used methods, Elements
and Element
. The Elements
method returns all of the child elements of the current element. You can optionally pass in an element name and then only those children element with a matching name are returned. The Element
method requires a name as an input parameter and then returns the first child element with that name.
The Elements
and Element
methods - along with a number of other methods we'll be examining in this article - are referred to as axis methods and operate relative to the current node. To hammer home this point, let's look at an example. For this example and others in this article I will be using an XML file named NutritionInfo.xml
. This XML file can be found in the App_Data
folder in the demo available for download at the end of this article.
The NutritionInfo.xml
document contains nutritional information about a variety of food items. Here is a snippet of this XML document:
<nutrition> |
The <nutrition>
element is the root element and contains a single child element named <daily-values>
, which spells out the recommended daily allotments for the various nutritional metrics provided by each <food>
item. Note that there is only one <daily-values>
element. Following this sole <daily-values>
element there are a number of <food>
elements that spell out the nutritional information for a number of food items. The snippet above shows a single <food>
element describing the nutritional information for Avocado Dip.
Now, imagine that we wanted to retrieve the name of the first food item in the XML document. To accomplish this we'd need to start by loading the XML document. Recall that the Load
method returns the root of the document as an XElement
object (in this example, <nutrition>
).
// C# |
Now that we have a reference to the root we can get the first <food>
element using the following syntax:
// C# |
This syntax says, in English, "Get me the root's first child element named <food>
." (If there are no <food>
child elements then root.Element("food")
will return null
.) Once we have the first <food>
element we can get its <name>
child element using the same syntax:
// C# |
Note that to get the <name>
element we call the firstFoodElement
XElement
object's Element
method. Had we accidentally used root.Element("name")
we'd get back a null value because the root element does not have any <name>
children elements (it only has <daily-value>
and <food>
child elements).
Now that we have the <name>
element (of the first <food>
element) we can get its text value ("Avocado Dip", in this example) by using the Value
property.
// C# |
Reading Attributes
Another important class in the LINQ to XML API is the XAttribute
class, which represents an XML attribute. The XElement
class has two methods that return XAttribute
values:
Attribute(attributeName)
- returns anXAttribute
object for a specific attribute, andAttributes
- two overloads; the first accepts no input parameters and returns all attributes of theXElement
; the second overload accepts an attribute name and returns a collection of attributes of theXElement
with a matching name.
And like XElement
, The XAttribute
class has a Value
property, which returns the value of the attribute.
Let's look at using the Attribute
method to retrieve calorie information for the first food item (Avocado Dip). The NutritionInfo.xml
specifies calorie information using a <calories>
element with two attributes - total and fat - which return the total calories and the calories from fat, respectively. To retrieve these values programmatically we could use the following code:
// C# |
This syntax, I think, it pretty readable. For example, to get the total calories for the first food item we say, "Hey, root, give me your first <food>
element and then, from that, give me the first <calories>
element and then from that get the total
attribute and then give me its value. In the case of Avocado Dip, this returns a value of "110".
While the above syntax is quite terse and readable, it does make a number of presumptions - namely that there will be at least one <food>
child item from the root and that that <food>
item will have a <calories>
child and that the <calories>
element will have a total
attribute specified. If any of these elements or attributes are missing the above code will throw a NullReferenceException
because if no match is found the Element
and Attribute
methods return null
. To more safely query the XML document you would need to get the pieces one at a time and ensure that a null
value was not returned; the code in the demo available for download has a sample of this more careful syntax.
Returning Descendant and Ancestor Elements
The Element
and Elements
methods only search the set of children elements. For XML document specifying a hierarchical structure, such as the XML format of the Web.sitemap
file, there may be elements with the same name buried at arbitrary depths. To search across all descendants for the current node (and not just children) use the Descendants
method. The Descendants
method has two overrides. The first accepts no input parameters and returns all descendant nodes. The second accepts a name and returns only those descendants whose name matches.
The following snippet of code shows how to use the Descendants
method to determine how many <siteMapNode>
elements exist in the Web.sitemap
file. (If you are unfamiliar with the Web.sitemap
file it is an XML-formatted file that developers can create to define a logical structure to their site. Once defined, navigation web controls like the Menu or TreeView can be used to display this site structure. The Web.sitemap
file is composed of an arbitrary number of <siteMapNode>
elements, where each <siteMapNode>
element represents a section on the site. These elements can be (and often are) nested. See Examining ASP.NET's Site Navigation for more information on this file and ASP.NET's site map functionality.)
// C# |
The code here is a little bit more involved than previous examples because the Web.sitemap
uses XML namespaces. If you examine the Web.sitemap
you'll find that its root element (<siteMap>
) defines a namespace named "http://schemas.microsoft.com/AspNet/SiteMap-File-1.0":
<siteMap xmlns="http://schemas.microsoft.com/AspNet/SiteMap-File-1.0"> |
Querying an XML document that uses namespaces requires that the namespaces be included in the querying syntax. This is accomplished by creating an XNamespace
object that specifies the namespace name and then including it as part of the name in the XElement
's methods. In the above example this is accomplished by creating an XNamespace
object named siteMapNS
and then including it when calling the Descendants
method: root.Descendants(siteMapNS + "siteMapNode")
.
Along with the Descendants
method, the XElement also offers an Ancestors
method. This method is the inverse of Descendants
- rather than returning the nodes (or matching nodes) beneath the element it returns the parent node, the grandparent node, and so forth, all the way up to the root. See the demo available for download for a demo using the Ancestors
method.
Searching / Filtering an XML Document
Because the LINQ to XML API gives us full access to LINQ's standard query operators, searching or filtering an XML document is very straightforward. As discussed in previous installments of this article series, the Where
extension method can operate on an enumeration and filter certain elements out of that enumeration using lambda expressions.
For example, use the following Where
clause to retrieve only those food items with less than 300 total calories:
// C# |
The code here says, in English, "Give me all <food>
child elements off the root and then only return those whose <calories>
element's total
attribute has a value less than 300." Bear in that in the lambda expression in the Where
method we are dealing with <food>
elements; in other words, each f
here is an XElement
that represents a particular <food>
element in the XML document. Consequently, to retrieve the calorie information for each <food>
element we use f.Element("calories")
to get a reference to the <calories>
element and then Attribute("total").Value
to get the value of the total
attribute. The XAttribute
class's Value
property returns a string, so we need to convert this string into a decimal value in order to compare it to a numeric value, in this case 300.
In addition to searching and filtering XML documents using the LINQ standard query operators you can use XPath expressions. XPath is a standardized syntax for filtering XML documents. To filter documents using XPath expressions use the XPathSelectElements
method, which is an extension method defined in the System.Xml.XPath
namespace. The following example uses an XPath expression to return only those food items with less than 300 calories:
// C# |
Personally, I prefer using LINQ's standard query operators. Using the standard query operators and lambda expressions you get IntelliSense and compile-time checking. Moreover, the same standard query operators can be used with LINQ to Objects, LINQ to SQL, LINQ to Entity Framework, or any other LINQ providers. XPath expressions, on the other hand, are an opaque string. There is no compile-time checking - you need to actually execute the code to see if the XPath expression is valid and returns the expected results. And XPath's syntax is specific to XML.
Check out the demo for more searching and filtering code examples. The demo includes a web page that allows the user to search for food items that meet a variety of criteria, including upper bounds for the calories, grams of fat, and milligrams of sodium, as well as the presence of certain vitamins or minerals. The screen shot below shows this page from the demo in action and includes code showing how to filter using the standard query operators and using XPath expressions.
Looking Forward...
At this point we have examined how to create, query, and filter XML documents using the LINQ to XML API. In a future installment we'll see how to edit existing XML documents by modifying existing values and by adding and removing XML elements.
Until then... Happy Programming!