Working with XML in a Classic COM Application
By Kate Gregory
From Kate Gregory's Codeguru column, "Using Visual C++ .NET".
XML is at the heart of .NET. You can hardly read a single page of a .NET article, whitepaper, or help entry without coming across those three little letters. But XML was changing everything before .NET came along, and you can work with it from a "traditional" Win32 application. In this column, I'll cover how to read, write, and transform XML using the Microsoft COM component called MSXML4. Next time, I'll tackle the same tasks the .NET way.
First, if you don't have MSXML4 installed, you're going to need it. You can get it from http://msdn.microsoft.com/downloads/default.asp?url=/downloads/sample.asp?url=/msdn-files/027/001/766/msdncompositedoc.xml and follow the instructions there to install it on your development machine. If you're planning to distribute applications that use MSXML4, you'll want to get the redistributable cab file too.
Second, if you have no clue what XML is or why you should care, here's a quick laundry list of XML features. XML is a notation for representing structured and semi-structured data that:
- is plain ASCII text that can be read and understood by people
- is plain ASCII text that can easily be hand-typed
- can be processed in a just a few lines of code using freely-available (often free) components for essentially any language and operating system
- can be generated in a just a few lines of code using those same components
- can be easily transformed into HTML, PDF, Postscript or a variety of other print-friendly and display-friendly formats
- can be persisted back and forth to almost any database format using widely available components
- features as few rules and pre-requisites as possible for maximum availability and flexibility
If you'd like to know more, check out www.xml.org or www.w3.org/XML/1999/XML-in-10-points to see what all the fuss is about.
Sample XML
Here's a really simple file of XML to use in the sample application:
<?xml version="1.0" encoding="utf-8" ?>
<PurchaseOrder>
<Customer id="123"/>
<Item SKU="1234" Price="4.56" Quantity="1"/>
<Item SKU="1235" Price="4.58" Quantity="2"/>
</PurchaseOrder>
Loading XML with COM
The simplest application to demonstrate working with XML is a console application. To create one in Visual Studio .NET, choose File, New, Project, Visual C++ Projects, Win32 application. Change the application settings to Console Application. (Still using Visual C++ 6? Make a console app by choosing File, New Project, Win32 application and change the settings to Console Application. You should be able to use this same code.) I named mine XMLCOM, so my _tmain() function is in XMLCOM.cpp. Here's what it looks like:
#include "stdafx.h"
#import "msxml4.dll"
using namespace MSXML2;
#include <iostream>
using std::cout;
using std::endl;
int _tmain(int argc, _TCHAR* argv[])
{
CoInitialize(NULL);
{ //extra braces for scope only
MSXML2::IXMLDOMDocumentPtr
xmlDoc("MSXML2.DOMDocument.4.0");
xmlDoc->async = false;
bool ret = xmlDoc->load("sample.xml");
if ( ret)
{
cout << "Document loaded ok." << endl;
}
else
{
cout << "load problem" << endl;
}
}
CoUninitialize();
return 0;
}
You can see that I'm using #import to bring in the COM library to make my coding as simple as possible. I call CoInitialize() at the start, to get COM ready for me, and CoUninitialize() at the end. In between I make an instance of the COM object on the stack. It looks like a pointer, but it's really an object using a template set up for me by the #import statement. It will clean up after itself when it goes out of scope, so I've wrapped it up in an extra set of brace brackets just so that I can send it out of scope before calling CoUninitialize(). That part should be familiar to COM programmers who've used the #import timesavers before.
Where's the XML part? The call to load(). This function takes a URL or a relative file name, and reads in the XML from that URL or in that file. It returns false if there's a problem, such as the file not being found or the XML in it not being well-formed. One line of code to get all that XML parsed and into memory. Now you can do things with it.
Simple arithmetic with the contents of a document
Here's a simple thing to do:
double total = 0;
cout << "Document loaded ok." << endl;
MSXML2::IXMLDOMNodeListPtr items =
xmlDoc->getElementsByTagName("Item");
long numitems;
items->get_length(&numitems);
for (int i=0;i<numitems;i++)
{
MSXML2::IXMLDOMNodePtr item ;
items->get_item(i, &item);
double price =
item->attributes->getNamedItem("Price")->GetnodeValue();
double qty =
item->attributes->getNamedItem("Quantity")->GetnodeValue();
total += price * qty;
}
cout << "Purchase Order total is $" << total << endl;
I put this code in the if (ret) block, replacing the single output statement that was there before. You can see that it uses a variety of handy functions from the DOM API:
- getElementsbyTagName() returns a list of all the <Item> elements.
- get_length() gets the length of the list of <Item> elements.
- get_item() gets an item from the list.
- attributes is a property of the item, and it's a list of attributes on the item. The list is held as a map, or lookup table.
- getNameItem() looks up entries in the map using the string provided
- getNodeValue() extracts the contents of the attribute so you can use it in simple arithmetic
How did I memorize all that? I didn't. Intellisense helps tremendously. I knew about getElementsbyTagName(), it's one of the standard XML functions that all the XML processors support. The help told me what it returns, an IXMLDOMNodeList, and I tacked Ptr on the end so I could get the helper class created from the #import. Then it was just a matter of typing -> and seeing what Intellisense offered.
There's an interesting convention at work in the helper classes that are created for you when you use #import. Function names that start get_ take an address of something you allocated, and put a value in it, like this:
items->get_length(&numitems);
Function names that start Get (with no underscore) just return what you're looking for, like this:
double qty =
item->attributes->getNamedItem("Quantity")->GetnodeValue();
Quite often both sets of functions are supported in a class, but the C++ samples in the help only show you the kind that takes a pointer. I guess C++ people are not supposed to like having answers returned to us, or something. Anyway, remember that both sets are there.
When this application runs, it prints out:
Document loaded ok.
Purchase Order total is $13.72
That's how simple it is to run through a file of XML and do something with it. But why stop there? You can transform XML from one layout to another -- or to HTML, or to any other number of formats. You can write it out to a file or the screen pretty easily, too:
_bstr_t text = xmlDoc->Getxml();
char* printable = text;
cout << printable << endl;
This code uses the helper class _bstr_t which wraps a BSTR and provides a conversion to a char*. You need to ask for the conversion, though — you can't just send the BSTR to cout. Still, this is pretty neat stuff. I encourage you to play with XML whenever you get the chance. It really is changing everything.
If you never were a COM programmer, get hives from HRESULTs and shudder at the thought of a BSTR, take heart! Next time you'll see this same work the .NET way.
About the Author
Kate Gregory is a founding partner of Gregory Consulting Limited (www.gregcons.com). In January 2002, she was appointed MSDN Regional Director for Toronto, Canada. Her experience with C++ stretches back to before Visual C++ existed. She is a well-known speaker and lecturer at colleges and Microsoft events on subjects such as .NET, Visual Studio, XML, UML, C++, Java, and the Internet. Kate and her colleagues at Gregory Consulting specialize in combining software develoment with Web site development to create active sites. They build quality custom and off-the-shelf software components for Web pages and other applications. Kate is the author of numerous books for Que, including Special Edition Using Visual C++ .NET.
# # #
Previous article: Creating and Using a Web Service in Managed C++
Next article: Working with XML in Managed C++