Logo: TechTrax...brought to you by MouseTrax Computing Solutions

Unlock the Data in Your XML Files with a new freeware tool called XPath Explorer

by Bill Coan, MVP

This article is protected by Copyscape! DO NOT COPY without permission!

Skill rating level 4.

Learn how to harvest data from your XML files with the help of the XML Path (XPath) Language

With the release of Office 2003 in December 2003, Microsoft placed a big bet on the future of XML. Three years later, XML is more popular than ever, and now Microsoft has released Office 2007—the most XML-friendly version of Office ever.

As Office XML files proliferate, the mountains of data inside those files grow larger by the day. Fortunately, data tagged with XML markup needn’t remain locked inside the files where the data is stored. Instead, with the help of the XML Path Language (more commonly known as XPath), you can harvest the data in XML files with the speed and precision of a robotic surgical arm.

If this strikes you as something that only a crazed data-holic or an IT specialist would contemplate, wait until you see how easy it can be to navigate the hierarchical structures found inside XML data files.

You already know how to navigate hierarchical data structures

Keep in mind that you’ve been navigating hierarchical data structures for as long as you’ve been working in Windows. For example, consider the following set of nested folders on a hard drive:

Figure 1 - A familiar example of a hierarchical data structure: nested folders on a hard drive.

If you started at the root folder of the hard drive (C:\) and had to find your way to the FIRSTNAME folder near the bottom of Figure 1, how would you get there?

I suppose you could check every folder on the hard drive until you found one called FIRSTNAME, but what if there are multiple folders called FIRSTNAME? How could you be certain that you found the folder shown at the bottom of Figure 1?

The answer is both simple and obvious. To get to the desired folder, you would start at the root folder (C:\), and inside that folder you would open the BOOKS folder, and inside that folder you would open the BOOK folder, and inside that folder you would open the AUTHOR folder, and inside that folder you would open the FIRSTNAME folder.

This is sometimes referred to as “walking the hierarchy” or “walking the path” to the desired folder. Indeed, in Windows (as in other file systems), the full name of a folder is referred to as its pathname. The pathname of the folder at the bottom of Figure 1 is:

C:\BOOKS\BOOK\AUTHOR\FIRSTNAME.

An XML data hierarchy is very similar to a set of nested folders on a hard drive

If you think it’s useful to walk a path to a particular folder on a hard drive, just imagine being able to walk a path inside an XML file until you come to the exact individual element or collection of elements of data that your boss wants on her desk this instant! That’s exactly what the XPath language enables you to do.

Consider the following XML markup inside a Microsoft Word document. (See Figure 2.) As you can see, the markup describes a hierarchical data structure very similar to the nested folders in Figure 1.

Figure 2 - Another hierarchical data structure: nested XML elements inside a Microsoft Word document.

In this case, there is a BOOKS element (instead of a BOOKS folder) with a BOOK element nested inside it. Inside the BOOK element is a TITLE element. Inside the TITLE element is an AUTHOR element. And inside the AUTHOR element is a FIRSTNAME element and a LASTNAME element.

Although you can’t tell just by looking at the document, the elements in the document belong to a particular namespace that distinguishes them from similarly named elements in other namespaces. The namespace is “http://www.wordsite.com/books.”

Navigating an XML data hierarchy is very similar to navigating a set of nested folders

In order to walk a path to the data you’re interested in, you need to specify the namespace that the elements belong to, and then you need to specify the path that you want to walk.

To specify the namespace, you use a statement such as the following:

xmlns:x="http://www.wordsite.com/books"

To specify the path that starts at the BOOKS element, and which then drills down to the BOOK element, you use the following statement:

/x:BOOKS/x:BOOK

The point here isn’t to fully explain the XPath language, because there are many articles that already do that. (A link to a particularly useful one is provided at the end of this article.) Rather, the point here is to help you recognize that walking a path to a particular element of data inside an XML file has a lot in common with walking a path to a particular folder on a hard drive.

XML-enabled versions of Word 2003 and Word 2007 can help you make powerful use of XPath

Starting with Word 2003, XPath expressions can be used in INCLUDETEXT fields to pull into a Word document data from an xml document. The xml document can be an ordinary XML text file or it can be an Office document (such as an Excel spreadsheet or Word document) saved in XML format. For more information about the use of XPath expressions in INCLUDETEXT fields, look up INCLUDETEXT fields in the Word 2003 or Word 2007 Help system, or visit the following web page:

http://office.microsoft.com/assistance/hfws.aspx?AssetID=HP051861651033&CTT=1&Origin=EC010227131033

Starting with Word 2007, XPath expressions can be used to link content controls to a Word document's xml datastore. Since the XML datastore can be accessed by external programs, content controls linked to the datastore can automatically display in the Word document XML data from external programs. For more information about the use of XPath expressions with content controls, look up content controls in the Word 2007 Help system, or visit the following web page:

http://channel9.msdn.com/ShowPost.aspx?PostID=254539

XPath Explorer helps you learn how to leverage the power of the XPath language

Although the fundamentals of the XPath language are very simple, the language is very powerful. The best way to explore the language is to work with it in a situation where you can see immediately the data returned by your XPath expressions. The XPath Explorer lets you do exactly that. (See Figure 3.)

Figure 3 - XPath Explorer, a new freeware tool developed by Microsoft Word MVP Bill Coan.

XPath Explorer is compatible with Word 2003 and Word 2007, and it works with any document that contains XML markup (including arbitrary XML files opened in Word), It can generate an XML document suitable for experimentation if you don't have one at hand.

When you work with a sample XML document created by XPath Explorer, you can experiment with 30 built-in XPath expressions for that sample, but you can also enter any arbitrary XPath expression that interests you.

The tool lets you view the results of an XPath expression complete with XML markup (including WordProcessingML markup if desired) or as plain text with no markup.

You can download a free copy of XPath Explorer here:

url:http://www.wordsite.com/downloads/xpathexplorer.htm

For more information about the XPath language

For more information about the XPath language, visit the following web page:

http://msdn2.microsoft.com/en-gb/library/ms256122.aspx

Click to rate this article.

Go up to the top of this page.
This site powered by the Logical Web Publisher™: Content management by Logical Expressions, Inc.