Java XPath Tutorial: How to Parse XML File using XPath in Java

XPath is a language for finding information in an XML file. You can say that XPath is (sort of) SQL for XML files. XPath is used to navigate through elements and attributes in an XML document. You can also use XPath to traverse through an XML file in Java.

XPath comes with powerful expressions that can be used to parse an xml document and retrieve relevant information.

For demo, let us consider an xml file that holds information of employees.

<?xml version="1.0"?>
<Employees>
	<Employee emplid="1111" type="admin">
		<firstname>John</firstname>
		<lastname>Watson</lastname>
		<age>30</age>
		<email>johnwatson@sh.com</email>
	</Employee>
	<Employee emplid="2222" type="admin">
		<firstname>Sherlock</firstname>
		<lastname>Homes</lastname>
		<age>32</age>
		<email>sherlock@sh.com</email>
	</Employee>
	<Employee emplid="3333" type="user">
		<firstname>Jim</firstname>
		<lastname>Moriarty</lastname>
		<age>52</age>
		<email>jim@sh.com</email>
	</Employee>
	<Employee emplid="4444" type="user">
		<firstname>Mycroft</firstname>
		<lastname>Holmes</lastname>
		<age>41</age>
		<email>mycroft@sh.com</email>
	</Employee>
</Employees>

I have saved this file at path C:\employees.xml. We will use this xml file in our demo and will try to fetch useful information using XPath. Before we start lets check few facts from above xml file.

  1. There are 4 employees in our xml file
  2. Each employee has a unique employee id defined by attribute emplid
  3. Each employee also has an attribute type which defines whether an employee is admin or user.
  4. Each employee has four child nodes: firstname, lastname, age and email
  5. Age is a number

Let’s get started…

1. Learning Java DOM Parsing API

In order to understand XPath, first we need to understand basics of DOM parsing in Java. Java provides powerful implementation of domparser in form of below API.

1.1 Creating a Java DOM XML Parser

First, we need to create a document builder using DocumentBuilderFactory class. Just follow the code. It’s pretty much self explainatory.

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
//...

DocumentBuilderFactory builderFactory =
        DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
    builder = builderFactory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
    e.printStackTrace();  
}

1.2 Parsing XML with a Java DOM Parser

Once we have a document builder object. We uses it to parse XML file and create a document object.

import org.w3c.dom.Document;
import java.io.IOException;
import org.xml.sax.SAXException;
//...

try {
    Document document = builder.parse(
            new FileInputStream("c:\\employees.xml"));
} catch (SAXException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

In above code, we are parsing an XML file from filesystem. Sometimes you might want to parse XML specified as String value instead of reading it from file. Below code comes handy to parse XML specified as String.

String xml = ...;
Document xmlDocument = builder.parse(new ByteArrayInputStream(xml.getBytes()));

1.3 Creating an XPath object

Once we have document object. We are ready to use XPath. Just create an xpath object using XPathFactory.

import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
//...

XPath xPath =  XPathFactory.newInstance().newXPath();

1.4 Using XPath to parse the XML

Use xpath object to complie an XPath expression and evaluate it on document. In below code we read email address of employee having employee id = 3333. Also we have specified APIs to read an XML node and a nodelist.

String expression = "/Employees/Employee[@emplid='3333']/email";

//read a string value
String email = xPath.compile(expression).evaluate(xmlDocument);

//read an xml node using xpath
Node node = (Node) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODE);

//read a nodelist using xpath
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);

2. Learning XPath Expressions

As mentioned above, XPath uses a path expression to select nodes or list of node from an xml document. Heres a list of useful paths and expression that can be used to select any node/nodelist from an xml document.

ExpressionDescription
nodenameSelects all nodes with the name “nodename”
/Selects from the root node
//Selects nodes in the document from the current node that match the selection no matter where they are
.Selects the current node
..Selects the parent of the current node
@Selects attributes
employeeSelects all nodes with the name “employee”
employees/employeeSelects all employee elements that are children of employees
//employeeSelects all book elements no matter where they are in the document

Below list of expressions are called Predicates. The Predicates are defined in square brackets [ ... ]. They are used to find a specific node or a node that contains a specific value.

Path ExpressionResult
/employees/employee[1]Selects the first employee element that is the child of the employees element.
/employees/employee[last()]Selects the last employee element that is the child of the employees element
/employees/employee[last()-1]Selects the last but one employee element that is the child of the employees element
//employee[@type='admin']Selects all the employee elements that have an attribute named type with a value of ‘admin’

There are other useful expressions that you can use to query the data.

Read this w3school page for more details: http://www.w3schools.com/xpath/xpath_syntax.asp

3. Examples: Query XML document using XPath

Below are few examples of using different expressions of xpath to fetch some information from xml document.

3.1 Read firstname of all employees

Below expression will read firstname of all the employees.

String expression = "/Employees/Employee/firstname";
System.out.println(expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
}

Output:

John
Sherlock
Jim
Mycroft

3.2 Read a specific employee using employee id

Below expression will read employee information for employee with emplid = 2222. Check how we used API to retrieve node information and then traveresed this node to print xml tag and its value.

String expression = "/Employees/Employee[@emplid='2222']";
System.out.println(expression);
Node node = (Node) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODE);
if(null != node) {
	nodeList = node.getChildNodes();
	for (int i = 0;null!=nodeList && i < nodeList.getLength(); i++) {
		Node nod = nodeList.item(i);
		if(nod.getNodeType() == Node.ELEMENT_NODE)
			System.out.println(nodeList.item(i).getNodeName() + " : " + nod.getFirstChild().getNodeValue()); 
	}
}

Output:

firstname : Sherlock
lastname : Homes
age : 32
email : sherlock@sh.com

3.3 Read firstname of all employees who are admin

This is again a predicate example to read firstname of all employee who are admin (defined by type=admin).

String expression = "/Employees/Employee[@type='admin']/firstname";
System.out.println(expression);
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
}

Output:

John
Sherlock

3.4 Read firstname of all employees who are older than 40 year

See how we used predicate to filter employees who has age > 40.

String expression = "/Employees/Employee[age>40]/firstname";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
}

Output:

Jim
Mycroft

3.5 Read firstname of first two employees (defined in xml file)

Within predicates, you can use position() to identify the position of xml element. Here we are filtering first two employees using position().

String expression = "/Employees/Employee[position() <= 2]/firstname";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
}

Output:

John
Sherlock

4. Complete Java source code

In order to execute this source, just create a basic Java project in your IDE or just save below code in Main.java and execute. It will need employees.xml file as input. Copy the employee xml defined in start of this tutorial at c:\\employees.xml.

package net.viralpatel.java;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class Main {
	public static void main(String[] args) {

		try {
			FileInputStream file = new FileInputStream(new File("c:/employees.xml"));
				
			DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
			
			DocumentBuilder builder =  builderFactory.newDocumentBuilder();
			
			Document xmlDocument = builder.parse(file);

			XPath xPath =  XPathFactory.newInstance().newXPath();

			System.out.println("*************************");
			String expression = "/Employees/Employee[@emplid='3333']/email";
			System.out.println(expression);
			String email = xPath.compile(expression).evaluate(xmlDocument);
			System.out.println(email);

			System.out.println("*************************");
			expression = "/Employees/Employee/firstname";
			System.out.println(expression);
			NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
			for (int i = 0; i < nodeList.getLength(); i++) {
			    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
			}

			System.out.println("*************************");
			expression = "/Employees/Employee[@type='admin']/firstname";
			System.out.println(expression);
			nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
			for (int i = 0; i < nodeList.getLength(); i++) {
			    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
			}

			System.out.println("*************************");
			expression = "/Employees/Employee[@emplid='2222']";
			System.out.println(expression);
			Node node = (Node) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODE);
			if(null != node) {
				nodeList = node.getChildNodes();
				for (int i = 0;null!=nodeList && i < nodeList.getLength(); i++) {
					Node nod = nodeList.item(i);
					if(nod.getNodeType() == Node.ELEMENT_NODE)
						System.out.println(nodeList.item(i).getNodeName() + " : " + nod.getFirstChild().getNodeValue()); 
				}
			}
			
			System.out.println("*************************");

			expression = "/Employees/Employee[age>40]/firstname";
			nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
			System.out.println(expression);
			for (int i = 0; i < nodeList.getLength(); i++) {
			    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
			}
		
			System.out.println("*************************");
			expression = "/Employees/Employee[1]/firstname";
			System.out.println(expression);
			nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
			for (int i = 0; i < nodeList.getLength(); i++) {
			    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
			}
			System.out.println("*************************");
			expression = "/Employees/Employee[position() <= 2]/firstname";
			System.out.println(expression);
			nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
			for (int i = 0; i < nodeList.getLength(); i++) {
			    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
			}

			System.out.println("*************************");
			expression = "/Employees/Employee[last()]/firstname";
			System.out.println(expression);
			nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
			for (int i = 0; i < nodeList.getLength(); i++) {
			    System.out.println(nodeList.item(i).getFirstChild().getNodeValue()); 
			}

			System.out.println("*************************");

		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (SAXException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		} catch (ParserConfigurationException e) {
			e.printStackTrace();
		} catch (XPathExpressionException e) {
			e.printStackTrace();
		}		
	}
}

That’s all folks :)



49 Comments

  • Murali Prashanth 14 May, 2013, 10:00

    Nice tutorial, infact i was searching for this thanks..

  • Karina 6 June, 2013, 5:53

    This is a great tutorial. Thanks much for all your help!
    You took the time to nicely demonstrate the basics of XPath XML parsing.
    Keep up the good work!

  • GreekStudent 21 June, 2013, 2:47

    Thanks! Very useful for exam preparation :)

  • Hoda 23 June, 2013, 15:16

    Thanks for your useful tutorial.
    I have a question that I’d be glad if you answer me.
    If I want to read, tag elements, what should I do?
    I tried it: System.out.println(nod.getAttributes());
    but the output is : com.sun.org.apache.xerces.internal.dom.AttributeMap@1729854
    so I tried it: System.out.println((Element)nod.getAttributes());
    but the output is: Exception in thread “main” java.lang.ClassCastException: com.sun.org.apache.xerces.internal.dom.AttributeMap cannot be cast to org.w3c.dom.Element
    now Im’Confused. please guide me how should I read them?

  • Abhishek 26 June, 2013, 16:56

    Thanks

  • Susovan 11 July, 2013, 15:04

    Superb tips..Viral..keep up

  • saurabh 16 July, 2013, 16:24

    ur tutorials helped me alot for understanding xml to db data transfer………tooo goood …. keep going on…and thank a lalot

  • Gergely 23 July, 2013, 12:16

    Thank you, ViralPatel, it’s nice!

    It helps me a lot.

  • Rehan 26 July, 2013, 17:36

    Thanks Viral

  • Pratik 6 August, 2013, 8:15

    Thanks ! It was quite useful

  • Deepak 23 August, 2013, 0:29

    Excellent Tutorial Viral, Thanks

  • Kundan Kumar 28 August, 2013, 11:56

    Hi i need to create following xml file using DOM in java can you help me for the same … it’s very urgent.

    <PersonalInfo xsi:type="NameValuePairType">
    	<Name> PGI </Name>
    	<value xsi:type="NameValuePairType">
    		<Name>PGI-800</Name>
    		<Value xsi:type="xsi:type="xs:string">eeeeeeeeeee</Value>
    	</value>
    	<value xsi:type="NameValuePairType">
    		<Name>PGI-801</Name>
    		<Value xsi:type="xsi:type="xs:string">fffffffffff</Value>
    	</value>
    </PersonalInfo>
    
  • Anonymous 28 August, 2013, 22:32

    Yo may want to show imports when you provide a sample. What class of NODE is that? SOAP? Why would I need SOAP import to use XPath? All these questions come when you do not provide complete sample with tutorial. We simply are not experimenting with guesses and class cast exceptions.

    • Viral Patel 29 August, 2013, 13:03

      Check the complete Java source at the end of tutorial. The Node class is org.w3c.dom.Node.

  • Anonymous 29 August, 2013, 3:18

    How do I parse String? Honestly it is overkill requiring File to parse a string. Most of XML documents as simply strings and as such they should not need file or URI anywhere. Location is not function of parsing and it is very poor design of the interface. The only reasonable interface is either String or InputStream.

    I am switching to JIBX or JAXB.

    If you do not agree than explain to me what is purpose of DocumentBuilderFactory and then DocumentBuilder. Those are creation OO patterns and there is no reason to use those (poor design patterns) if you look at pattern motivation by book. I would recommend those who created them to take a look at “Gang of Four” book and start using head. We have something as simple as constructor and we do not need singletons or builders to parse document.

    Waste of precious work on unnecessary operations to parse using XPath witch handling five types of exceptions when we need only two realistaically.

  • Locks 3 September, 2013, 14:41

    Thanks. I need this badly.

  • Star 3 September, 2013, 18:14

    It is very useful

  • Star 4 September, 2013, 10:31

    Thank once again. It helped me a lot :). Thanks for this wonderful documentation with clear explanation.

  • edy 6 September, 2013, 16:47

    Great stuff, straight to the point

  • dimsob 25 September, 2013, 16:24

    Hi.This promlem:
    my xml:

     <API request="RetailAnonQPRequest" version="1.0"><Response result="OK"><Auth terminal_id="2" terminal_pwd="password"/><RetailAnonQPResponse><RetailReceipt msg_uid="1187769045" redemption_code="146866093644444" security_code="d4eaf1a9ed7a"><Picks><Pick numbers="01,04,18,27,42"/><Pick numbers="33,32,35,37,37"/></Picks><Entries><Entry draw_name="Some" draw_time="2013-09-01T14:00:00" pool_name="APIPool"/></Entries></RetailReceipt></RetailAnonQPResponse></Response></API>
    

    how to take ’33,32,35,37,37′ from second , ?
    thanks.

  • Ankit Gupta 3 October, 2013, 14:47

    Very nice tutorial. Explained very nicely even for newbies :). Thanks a lot

  • Prakash 15 October, 2013, 17:12

    How to parse the blow xml using xpath

    <!DOCTYPE article_set SYSTEM "s1.dtd">
    <article_set dtd_version="4.13">
    <article lang="EN" rev="2" ms_no="PHY2-2013-05-0073.R2" export_date="2013-10-15 00:00:00.0">
    <journal>
    <publisher_name>ABC</publisher_name>
    <full_journal_title>ABC</full_journal_title>
    <journal_abbreviation>CDE</journal_abbreviation>
    <pubmed_abbreviation/>
    <issn issn_type="print">232323</issn>
    <issn issn_type="digital">3434</issn>
    </journal>
    </article>

    I am getting NullPointerException when reading pubmed_abbreviation tag.

    Please help with some ideas.

  • abhilash 25 October, 2013, 11:23

    abcdefghijklmno
    i want to write an xpath to get

    abcdefghijlkmno as output.
    how to do it??

  • abhilash 25 October, 2013, 11:25
    <h2>abc<b>def</b>ghi<i>jkl</i>mno</h2>

    i want abcdefghijklmno as output what xpath should i write

  • Jewel 11 November, 2013, 15:15

    what it is work on jdk version?
    is it work for jdk1.5?

  • Rakshit Sangani 11 November, 2013, 15:28

    Extremly x 100 !!!! nice tutorial. :)

  • Teena 14 November, 2013, 17:20

    This is wonderful documentation with clear explanation.
    our requirement is based on the target system schema we need to map the source and traget system to match to traget system and again write back to XML.
    Input is XML and
    Expected output : XML

  • rJbueno 20 November, 2013, 22:54

    Good stuff! I needed to make the switch from PHP XPath to Java XPath, this tutorial really helped!

  • Venu 23 November, 2013, 10:18

    Nice Tutorial ……

    • chimata 28 January, 2014, 18:06

      very nice tutorial

  • Gr Kumar 29 November, 2013, 10:09

    its simply clear

  • piyush 28 December, 2013, 0:27

    Hi i m making a search box in jsp,Its like i am reading the value entered in search box and comaring it with xml and displaying the desired result.
    Please reply quickly as its urgent on how it can be done?

  • triathlonmarathon 28 December, 2013, 6:54

    Thank you very much Viral ;-)

  • Chetan 29 December, 2013, 23:36

    I am using below method to get Document object

     public static Document getDocument(String requestUrl,int timeout)
        {
        	 Document doc = null;
        	 StringBuilder builder = new StringBuilder();
            try {
                DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
                DocumentBuilder db = dbf.newDocumentBuilder();
                URL url = new URL(requestUrl);
                URLConnection con = url.openConnection();
                con.setConnectTimeout(timeout);//The timeout in mills
                doc = db.parse(con.getInputStream());
                } catch (Exception e) {
               LoggerUtil.error(CLASS_NAME,"getDocument(String requestUrl,int timeout)",e);
            }
            return doc;
        
        }
    

    Now problem is the response have some special characters(&) because of which parsing fails . how to solve this problem

  • rico 31 December, 2013, 14:19

    Hi,

    xpath expressions doesn’t work in jdk 1.7.

    this expression will work on jdk 1.6.
    String expression = “/Response/Item[@id='1']/Value”;

    can you give a sample to filter attribute using @id?
    example xml:

    Test 1

    Test 2

    Any tips?
    Br,
    Rico

  • priya 9 January, 2014, 17:27

    thanks,good job

  • Anne 11 January, 2014, 4:43

    this tutorial is awesome, thanks!

  • fr0st 21 January, 2014, 1:54

    Man, you’re awesome. This is by far the best XML parsing tutorial I’ve read on the internet. Thanks for the amazing write-up. Using XPath suits my need as I need to process each tag individually.

  • Sam 12 February, 2014, 15:12

    Thank you. Very useful Blog. Simple and Beautifully Explained.

  • Anuj Parashar 18 February, 2014, 23:30

    This is just great. Few days back, I was searching exactly the same thing and here you are. Keep it up.

  • seenivasan 21 February, 2014, 21:20

    first of all thanks .
    really its usefull examlple..

    i have one more Doubt
    how u do get type and empid in same time ????/

    thanks advance

  • Ariel 1 March, 2014, 3:16

    Why don’t you explain namespaces with XPath?

  • Jayashree 18 March, 2014, 17:55

    Very nice tutorial :) Will be much appreciated if you have provided xpath example for a xml that contains namespace. Keep up the good work :)

  • Mohammad 22 March, 2014, 14:02

    thanks so much:)
    great tutorial

  • Juan 27 March, 2014, 3:35

    Thanks a lot!!!
    Great work dude…it was really usefull for me!

  • Gurpreet Singh 1 April, 2014, 9:49

    Thnx for tutorial .. but i m not able to find out solution of my problem anywhere ..
    if anyone hv answer please reply..
    consider below XML

    <Employees dept="hi-tech">
        <Employee emplid="1111" type="admin">
            <firstname>John</firstname>
            <lastname>Watson</lastname>
            <age>30</age>
            <email>johnwatson@sh.com</email>
        </Employee>
        <Employee emplid="2222" type="admin">
            <firstname>Sherlock</firstname>
            <lastname>Homes</lastname>
            <age>32</age>
            <email>sherlock@sh.com</email>
        </Employee>
        <Employee emplid="3333" type="user">
            <firstname>Jim</firstname>
            <lastname>Moriarty</lastname>
            <age>52</age>
            <email>jim@sh.com</email>
        </Employee>
        <Employee emplid="4444" type="user">
            <firstname>Mycroft</firstname>
            <lastname>Holmes</lastname>
            <age>41</age>
            <email>mycroft@sh.com</email>
        </Employee>
    </Employees>
    

    Now i want to get value for attribute dept .i.e. ‘hi-tech’ ..

    • fngyjx@gmail.com 15 April, 2014, 9:23

      /Employees/@dept

  • Ashish Desai 7 April, 2014, 18:32

    One of the most comprehensive and straightforward tutorial for using XPath that i have found on the internet. Great work Viral.

Leave a Reply

Your email address will not be published. Required fields are marked *

Note

To post source code in comment, use [code language] [/code] tag, for example:

  • [code java] Java source code here [/code]
  • [code html] HTML here [/code]