Learn how to harvest data from your XML files
with the help of the XML Path (XPath) Language
With the release of Office 2003 in December 2003, Microsoft
placed a big bet on the future of XML. Three years later, XML is more popular
than ever, and now Microsoft has released Office 2007—the most XML-friendly
version of Office ever.
As Office XML files proliferate, the mountains of data
inside those files grow larger by the day. Fortunately, data tagged with XML
markup needn’t remain locked inside the files where the data is stored.
Instead, with the help of the XML Path Language (more commonly known as XPath),
you can harvest the data in XML files with the speed and precision of a robotic
If this strikes you as something that only a crazed data-holic
or an IT specialist would contemplate, wait until you see how easy it can be to
navigate the hierarchical structures found inside XML data files.
You already know how to navigate hierarchical data structures
Keep in mind that you’ve been navigating hierarchical data
structures for as long as you’ve been working in Windows. For example, consider
the following set of nested folders on a hard drive:
Figure 1 - A familiar example of a hierarchical data
structure: nested folders on a hard drive.
If you started at the root folder of the hard drive (C:\)
and had to find your way to the FIRSTNAME folder near the bottom of Figure 1,
how would you get there?
I suppose you could check every folder on the hard drive
until you found one called FIRSTNAME, but what if there are multiple folders
called FIRSTNAME? How could you be certain that you found the folder shown at
the bottom of Figure 1?
The answer is both simple and obvious. To get to the desired
folder, you would start at the root folder (C:\), and inside that folder you
would open the BOOKS folder, and inside that folder you would open the BOOK
folder, and inside that folder you would open the AUTHOR folder, and inside
that folder you would open the FIRSTNAME folder.
This is sometimes referred to as “walking the hierarchy” or
“walking the path” to the desired folder. Indeed, in Windows (as in other file
systems), the full name of a folder is referred to as its pathname. The
pathname of the folder at the bottom of Figure 1 is:
An XML data hierarchy is very similar to a set of nested folders on a hard
If you think it’s useful to walk a path to a particular
folder on a hard drive, just imagine being able to walk a path inside an XML
file until you come to the exact individual element or collection of elements
of data that your boss wants on her desk this instant! That’s exactly what the
XPath language enables you to do.
Consider the following XML markup inside a Microsoft Word
document. (See Figure 2.) As you can see, the markup describes a hierarchical
data structure very similar to the nested folders in Figure 1.
Figure 2 - Another hierarchical data structure: nested XML
elements inside a Microsoft Word document.
In this case, there is a BOOKS element (instead of a BOOKS folder)
with a BOOK element nested inside it. Inside the BOOK element is a TITLE element.
Inside the TITLE element is an AUTHOR element. And inside the AUTHOR element is
a FIRSTNAME element and a LASTNAME element.
Although you can’t tell just by looking at the document, the
elements in the document belong to a particular namespace that distinguishes
them from similarly named elements in other namespaces. The namespace is
Navigating an XML data hierarchy is very similar to navigating a set of
In order to walk a path to the data you’re interested in,
you need to specify the namespace that the elements belong to, and then you
need to specify the path that you want to walk.
To specify the namespace, you use a statement such as the
To specify the path that starts at the BOOKS element, and
which then drills down to the BOOK element, you use the following statement:
The point here isn’t to fully explain the XPath language,
because there are many articles that already do that. (A link to a particularly
useful one is provided at the end of this article.) Rather, the point here is
to help you recognize that walking a path to a particular element of data
inside an XML file has a lot in common with walking a path to a particular
folder on a hard drive.
XML-enabled versions of Word 2003 and Word 2007 can help you make powerful
use of XPath
Starting with Word 2003, XPath expressions can be used in
INCLUDETEXT fields to pull into a Word document data from an xml document. The
xml document can be an ordinary XML text file or it can be an Office document
(such as an Excel spreadsheet or Word document) saved in XML format. For more
information about the use of XPath expressions in INCLUDETEXT fields, look up
INCLUDETEXT fields in the Word 2003 or Word 2007 Help system, or visit the following
Starting with Word 2007, XPath expressions can be used to
link content controls to a Word document's xml datastore. Since the XML
datastore can be accessed by external programs, content controls linked to the
datastore can automatically display in the Word document XML data from external
programs. For more information about the use of XPath expressions with content
controls, look up content controls in the Word 2007 Help system, or visit the
following web page:
XPath Explorer helps you learn how to
leverage the power of the XPath language
Although the fundamentals of the XPath language are very
simple, the language is very powerful. The best way to explore the language is
to work with it in a situation where you can see immediately the data returned
by your XPath expressions. The XPath Explorer lets you do exactly that. (See
Figure 3 - XPath Explorer, a new freeware tool developed by
Microsoft Word MVP Bill Coan.
XPath Explorer is compatible with Word 2003 and Word 2007,
and it works with any document that contains XML markup (including arbitrary
XML files opened in Word), It can generate an XML document suitable for
experimentation if you don't have one at hand.
When you work with a sample XML document created by XPath
Explorer, you can experiment with 30 built-in XPath expressions for that
sample, but you can also enter any arbitrary XPath expression that interests
The tool lets you view the results of an XPath expression
complete with XML markup (including WordProcessingML markup if desired) or as
plain text with no markup.
You can download a free copy of XPath Explorer here:
For more information about the XPath language
For more information about the XPath language, visit the
following web page: