RSS, or Really Simple Syndication, is a method of sharing and broadcasting content such as news from a website. Using XML, items such as news articles can be automatically downloaded into a News Reader or published onto another website. There are two ways of using RSS; to share your data with others or to harvest others’ data for your site.
First version of RSS called RDF Site Summary was created by Ramanathan V. Guha at Netscape in March 1999. This version became known as RSS 0.9. Version 0.91 was launched by Dan Libby of Netscape in July 1999. He also renamed RSS “Rich Site Summary”.
Introduction to JSTL
Java Server Pages Standard Tag Library is a component of Java Enterprise Edition Web application development platform (J2EE 1.4 SDK) released by Sun Microsystems. JSTL provides an effective way to embed logic within a JSP page without using embedded Java code directly. The use of a standardized tag set, rather than breaking in and out of Java code leads to more maintainable code and enables separation of concerns between the development of the application code and user interface.
Commonly used JAR files
Parsing XML using JSTL
The x prefix type of tag of JSTL can be used for parsing xml documents.
Let us keep the XML content in a file called data.xml. Following is the content of data.xml
Assume that this file is placed next to JSP file where we need to parse this xml document. Following will be the JSP file which will parse data.xml file and display information of each student as a row in table.
Content of ShowStudents.jsp
Understanding the code
To parse a given XML document, first we need to import it using <c:import> tag. This tag will import the content of the url into a variable. In previous example, content of the data.xml will be copied into variable xmlDoc.
For parsing the imported XML content, <x:parse> tag is used. Hence the parsedDocument variable will contain parsed XML document. This variable can be used then to access other child tags as well as properties. <x:forEach> tag can be used to iterate across a given tag. Here in above example we have iterated through <student> tag by using <x:forEach select=”$parsedDocument/persons/student”>. Child elements of <student> tag can be accessed by <x:out select=”name” />. Hence this tag will print the value of the <name> tag.
Understanding structure of RSS 2.0
Since RSS 2.0 must be a valid XML, the first line of the feed must be the XML declaration.
The root of the RSS 2.0 format is <rss> and <channel> tag. All of the feed content goes inside these tags.
Next comes the information about the feed such as title of feed, the description, like of the site etc.
A channel may contain any number of <item>s. An item may represent a “story” — much like a story in a newspaper or magazine; if so its description is a synopsis of the story, and the link points to the full story. An item may also be complete in itself, if so, the description contains the text (entity-encoded HTML is allowed), and the link and title may be omitted. All elements of an item are optional, however at least one of title or description must be present. Each item has a title, link, description, publication date and guid.
- title: The name of the channel.
- link: The URL to the HTML website corresponding to the channel
- language: The language the channel is written in
- copyright: Copyright notice for content in the channel.
- managingEditor: Email address for person responsible for editorial content.
- webMaster: Email address for person responsible for technical issues relating to channel.
- pubDate: The publication date for the content in the channel.
- category: Specify one or more categories that the channel belongs to.
- generator: A string indicating the program used to generate the channel.
- image Specifies: a GIF, JPEG or PNG image that can be displayed with the channel.
Elements of <item>
- title: The title of the item.
- link: The URL of the item.
- description: The item synopsis.
- author: Email address of the author of the item.
- category: Includes the item in one or more categories.
- comments: URL of a page for comments relating to the item.
- enclosure: Describes a media object that is attached to the item
- guid: A string that uniquely identifies the item.
- pubDate: Indicates when the item was published.
- source: The RSS channel that the item came from.
Parsing RSS using JSTL
We will create a parser which parse the given RSS feed and will display it in a JSP using JSTL. For this we will have a textbox where in user can specify the Feed URL. This URL will be submitted to same JSP page and then will be parsed using
- JSTL Home Page (https://java.sun.com/products/jsp/jstl/)
- Documentation on XML tag library (http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JSTL5.html)
- Information about RSS (http://en.wikipedia.org/wiki/RSS_(file_format))
The given tutorial about Parsing RSS using JSTL uses Java technologies like JSP/JSTL etc and focus on RSS 2.0 format. There are other open source free tools available which do the similar work of parsing different feed formats like RSS 0.90, RSS 0.91, and Atom etc. Readers who are interested in such formats or the open source tools can refer to following links:
- RSS 1.0 Specification (http://web.resource.org/rss/1.0).
- Project Rome: Open Source Java tools for parsing, generating and publishing RSS and Atom feeds. (https://rome.dev.java.net/).
- Parsing RSS in .NET (http://www.rssdotnet.com/).