XPath is a language used to navigate through the XML document. It’s used to identify elements in the XML document. It is so limber that technologies like XQuery and XPointer are built on it. XPath uses path expressions to select nodes or node-sets in an XML document. These path expressions look very much like the expressions you see when you work with a traditional computer file system.
While using XPath, the xml document is treated as tree of nodes. See the example below
<?xml version="1.0" encoding="ISO-8859-1"?></pre> <bookstore> <book> <title lang="en">The Joke</title> <author>Milan Kundera</author> <price>350</price> </book> <book> <title lang="en">After Dark</title> <author>Haruki Mukarami</author> <price>450</price> </book> </bookstore>
Here, the tag <bookstore> is the root node. The tag <author> is the element node and the attribute lang is the attribute node. The nodes also have the hierarchical properties. For instance, the nodes children of the node <book>. Also, <title>,<author> and <price> are siblings.
XPath uses the following expressions to parse through the XML Document.
/bookstore/book - returns all the <book> nodes which are children of <bookstore> bookstore//book - returns all book elements that are descendant of the <bookstore> element, //@lang - returns all attributes that are named lang
Specific nodes can also be identified in XPath
/bookstore/book[1] - Returns the first book element /bookstore//book[last()-1] - Returns the second last element /bookstore/book[position()<3] - Returns the first two book elements //title[@lang] - Returns all title elements that has an attribute lang /bookstore/book[price>350]/title - Returns the titles of all books which has price more than 350
XPath supports wildcard characters as well
/bookstore/* - Selects all the children of the bookstore elements //title[@*] - Select all the title elements which has an attribute
The detailed list of parse syntax can be found here..
So, till now, everything was pretty simple. Here comes the most flexible and useful feature of XPath
XPath Axes
An axis defines a node-set relative to the current node. When we say, ‘the children of the current node’, the children defines a nodeset and thus children is an axes. Similarly parent, sibling, attribute are all axes W3C gives the complete list here..
From the examples we saw above, there are two types of location paths in XPath – absolute and relative
An absolute location path starts with a slash ( / ) and a relative location path does not. In both cases the location path consists of one or more steps, each separated by a slash
An absolute location path: /step/step/... A relative location path: step/step/...
A step in the examples above can consist of
- an axis (defines the tree-relationship between the selected nodes and the current node)
- a node (identifies a node within an axis)
- zero or more predicates (to further refine the selected node-set)
Generalizing it, a step would look like this
axis:node[predicate]
take away the axis and predicate and you are left with the kind of steps we saw in the above examples.
See some more examples
/bookstore/child::book - Select all the book nodes which are children of bookstore /bookstore/book/attribute::* - Select all the attributes of the book node child::*/child::price - Select all the price grandchildren of the current node
See that the first two example are absolute paths and the last one is a relative path.
XPath can be evaluated via javascript or through a PL like Java. I will soon chalk a post on that. Also, I really need to have another post dedicated to XPath axes..
One thought on “XPath for XML”
Comments are closed.