Navigating XML Graph using Cypher

Cypher is a neat way to manipulate a Neo4j database. It would be equally amazing if the Xml graph could be queried with Cypher as well.

Honestly, I must put credits to Michael for suggesting such a possibility here..

Well, let’s start with a simple xml file.

<library>
<author firstname="Earnest" lastname="Hemingway">
<works>
<book name="A Farewell to Arms" year="1929" />
<book name="For Whom the bell tolls" year="1940" />
<book name="The Old man and the sea" year="1951" />
</works>
<awards>
<award name = "Pulitzer Prize" category="Fiction" year="1953"></award>
<award name = "Nobel Prize" category="Literature" year="1954"></award>
</awards>
</author>
<author firstname="Victor" lastname="Hugo">
<works>
<book name="The Hunchback of Notre-Dame" year="1831" />
<book name="Les Misérables" year="1862" />
</works>
</author>
</library>

It’s a simple xml with nothing fancy in it. As explained in the previous posts here and here.. A neat neo4j graph can be made out of this…

Screenshot from 2014-07-22 21:16:46

So, let’s go about traversing this graph using Cypher.. And since we are trying to traverse an XML, let’s make a rough comparison to XPath.

Let’s fetch all the book nodes,

The Xpath to get all the book nodes, no matter where they are in the document, is

//book

For the same purpose, the Cypher query would be,

MATCH (books:book) RETURN books

This will fetch the following output for the above Graph,


bookNodes

Let’s now try to fetch the name of all books. The XPath will require only a slight modification,

//book/@name

The XPath will return the list as,

Attribute='name="A Farewell to Arms"'
Attribute='name="For Whom the bell tolls"'
Attribute='name="The Old man and the sea"'
Attribute='name="The Hunchback of Notre-Dame"'
Attribute='name="Les Misérables"'

The Cypher will only require a small modification. Instead of returning the entire node, fetch the ‘name’ attribute for the nodes.

MATCH (books:book) RETURN books.name

bookName Next up, let’s query the awards honoured to Earnest Hemingway,

This can be achieved via XPath as,

//author[@firstname=’Earnest’]/awards

which gives the output

<awards>
  <award name="Pulitzer Prize" category="Fiction" year="1953" />
  <award name="Nobel Prize" category="Literature" year="1954" />
</awards>

As for Cypher,

MATCH (author {firstname: “Earnest”})-[*]->(award:award) RETURN award

We try to fetch any node of the type ‘award’ connected to a node of type ‘author’ with firstname = Earnest

awards

The above examples are very much trivial, and aims to prove the possibility of using Cypher to traverse a database. I am looking for a huge and most importantly meaningful xml content which can be queried to get some useful information. Keep watching this space for more..

2 thoughts on “Navigating XML Graph using Cypher

  1. Really cool blog post!

    You might also want to look into the open source software analytics project http://jqassistant.org which does similar queries with source code (and other software artifacts).

    Do you actually handle duplicate/redundant child elements in the xml and merge them into a single node?

    Not sure in your first visualization you don’t see the book and author labels but just “Node” and “Parent”, not sure those are needed at all after importing the data?

    I’d love to see also queries using the relationship-types.
    In you last query I think a better representation of the XPath wouldn’t use an variable length paths, as “award” is directly under “author”. Variable length paths on arbitrary graphs can quickly get expensive.

    So either of these should do:

    MATCH (author {firstname: “Earnest”})-[:award]->(award:award) RETURN award
    MATCH (author {firstname: “Earnest”})-->(award:award) RETURN award
    MATCH (author {firstname: “Earnest”})-[:award]->(award) RETURN award
    
  2. Oh yes.. you have a point.. It can handle queries with relationships.. and the queries you mentioned should work fine.. 🙂
    And regarding the parent and node label, yeah.. we don’t need them. They are actually vestigials from an earlier snapshot version.. Planning to get rid of those soon..
    And duplicate node, I would prefer to see an option given to the consumer. It should be a matter of choice to have a complete graph or a graph which follows the structure of the imported XML.

    Also, Do you have any interesting datasets to try importing.

Comments are closed.