|
The Developer's Resource & Community Site
|
SAX 1.0: The Simple API for XML
(Reproduced with kind permision of Wrox Press: https://www.wrox.com)
Page 4 (Page 3):
The Rule-Based Design Pattern
An alternative way of structuring a SAX application, which
again has the objective of separating functions and keeping the structure
modular and simple, is a rule-based approach.
In general rule-based programs use an "Event-Condition-Action"
model: they contain a collection of rules of the form "if this event
occurs under these conditions, perform this action". Rule based
programming can thus be seen as a natural extension of event-based programming.
The processing model of XSL (discussed in Chapter 9) can be
seen as an example of rule-based programming. Each XSL template constitutes one
rule: the event is the processing of a node in the source document; the
condition is the pattern that controls which template is activated, and the
action is the body of the template. We can use the same concepts in a SAX
application.
The diagram below illustrates the structure of a rule-based
SAX application. The input from the XML parser is fed into a switch, which
evaluates the events against the defined conditions, and decides which actions
to invoke. The actions are then passed to processing modules each of which is
designed to perform one specific task.
There are all sorts of ways conditions and actions could be
implemented, but we'll describe a very simple implementation, where the
condition is based only on element type.
Firstly, let's write the DocumentHandler. We'll call it Switcher because its job is to switch
processing to a piece of code that handles the specific element type.
What Switcher does is
to maintain a set of rules as a Hashtable.
The set of rules is indexed by element type. The application can nominate a
class called an ElementHandler to
process a particular element type. When the parser notifies an element start tag,
the appropriate ElementHandler is located in the set of rules, and it is called
to process the start tag. At the same time, the ElementHandler is remembered on
a stack, so that the same ElementHandler can be used to process the end tag and
any character data occurring immediately within this element.
Here’s the Switcher
code:
import org.xml.sax.*;
import java.util.*;
/**
* Switcher is a DocumentHandler that directs events to an appropriate element
* handler based on the element type.
*/
public class Switcher extends HandlerBase
{
private Hashtable rules = new Hashtable();
private Stack stack = new Stack();
/**
* Define processing for an element type.
*/
public void setElementHandler(String name, ElementHandler handler)
{
rules.put(name, handler);
}
/**
* Start of an element. Decide what handler to use, and call it.
*/
public void startElement (String name, AttributeList atts) throws
SAXException
{
ElementHandler handler = (ElementHandler)rules.get(name);
stack.push(handler);
if (handler!=null)
{
handler.startElement(name, atts);
}
}
/**
* End of an element.
*/
public void endElement (String name) throws SAXException
{
ElementHandler handler = (ElementHandler)stack.pop();
if (handler!=null)
{
handler.endElement(name);
}
}
/**
* Character data.
*/
public void characters (char[] ch, int start, int length) throws SAXException
{
ElementHandler handler = (ElementHandler)stack.peek();
if (handler!=null)
{
handler.characters(ch, start, length);
}
}
}
An ElementHandler is rather like a DocumentHandler, but it
only ever gets to process a subset of the events: element start and end, and
character data. So although we could use a DocumentHandler here, we've defined
a special class. This serves both as a definition of the interface and as a
superclass for real element handlers: good Java coding practice might suggest
using a separate interface class, but this will do for now.
import org.xml.sax.*;
/**
* ElementHandler is a class that process the start and end tags and
* character data
* for one element type. This class itself does nothing; the
* real processing should
* be defined in a subclass
*/
public class ElementHandler {
/**
* Start of an element
*/
public void startElement (String name, AttributeList atts) throws
SAXException {}
/**
* End of an element
*/
public void endElement (String name) throws SAXException {}
/**
* Character data
*/
public void characters (char[] ch, int start, int length) throws
SAXException {}
}
So far this is all completely general. We could use the Switcher and ElementHandler
classes with any kind of document, to do any kind of processing. Now let's
exploit them for a real application: we want to produce an HTML page showing
selected data from our list of books.
Here's an application that does it. We'll start with the
main control structure, What this does is to create a Switcher
and register a number of ElementHandler classes to process
particular elements in the input XML document. It then creates a Parser, nominates Switcher
as the DocumentHandler, and runs the parse.
import org.xml.sax.*;
import com.icl.saxon.ParserManager;
public class DisplayBookList
{
public static void main (String args[]) throws Exception
{
(new DisplayBookList()).go(args[0]);
}
public void go(String input) throws Exception
{
Switcher s = new Switcher();
s.setElementHandler("books", new BooklistHandler());
s.setElementHandler("book", new BookHandler());
s.setElementHandler("author", new AuthorHandler());
s.setElementHandler("title", new TitleHandler());
s.setElementHandler("price", new PriceHandler());
s.setElementHandler("volume", new VolumeHandler());
Parser p = ParserManager.makeParser();
p.setDocumentHandler(s);
p.parse(input);
}
//...rest of code goes in here...
}
The actual element handlers can be defined as inner classes
within the DisplayBookList class: this is useful
because it enables them to share access to data.
The ElementHandler for the outermost element, "books",
causes a skeletal HTML page to be created:
private class BooklistHandler extends ElementHandler
{
public void startElement(String name, AttributeList atts)
{
System.out.println("<html>");
System.out.println("<head><title>Book List</title></head>");
System.out.println("<body><h1>A List of Books</h1>");
System.out.println("<table>");
System.out.println("<tr><th>Author</th>");
System.out.println("<th>Title</th><th>Price</th></tr>");
}
public void endElement(String name)
{
System.out.println("</table></body></html>");
}
}
The ElementHandler for the repeated "book" element
starts and ends a row in the generated HTML table, and initializes some
variables to hold the data:
private String author;
private String title;
private String price;
private boolean inVolume;
private class BookHandler extends ElementHandler
{
public void startElement(String name, AttributeList atts)
{
author = "";
title = "";
price = "";
inVolume = false;
}
public void endElement(String name)
{
System.out.println("<tr><td>" + author + "</td>");
System.out.println("<td>" + title + "</td>");
System.out.println("<td>" + price + "</td></tr>");
}
}
Finally, the element handlers for the fields within the <book> element update the local
variables holding the data. We're being careless about performance here in the
interests of clarity – it would be better to use StringBuffers rather than
Strings for the variables.
private class AuthorHandler extends ElementHandler
{
public void characters (char[] chars, int start, int len)
{
author = author + new String(chars, start, len);
}
}
private class TitleHandler extends ElementHandler
{
public void characters (char[] chars, int start, int len)
{
if (!inVolume)
{
title = title + new String(chars, start, len);
}
}
}
private class PriceHandler extends ElementHandler
{
public void characters (char[] chars, int start, int len)
{
if (!inVolume)
{
price = price + new String(chars, start, len);
}
}
}
private class VolumeHandler extends ElementHandler
{
public void startElement(String name, AttributeList atts)
{
inVolume = true;
}
public void endElement(String name)
{
inVolume = false;
}
}
The flag inVolume is used
to track whether the current element is within a containing <volume> element, in which case it is
ignored. Once you've put all this together (the full code can be found in the
download for the book at https://www.wrox.com)
you can run this on a sample XML file with a command like this:
>java DisplayBookList file:///c:/data/books2.xml
The following output should then appear:
<html>
<head><title>Book List</title></head>
<body><h1>A List of Books</h1>
<table>
<tr><th>Author</th><th>Title</th><th>Price</th></tr>
<tr><td>Nigel Rees</td>
<td>Sayings of the Century</td>
<td>8.95</td></tr>
<tr><td>Evelyn Waugh</td>
<td>Sword of Honour</td>
<td>12.99</td></tr>
<tr><td>Herman Melville</td>
<td>Moby Dick</td>
<td>8.99</td></tr>
<tr><td>J. R. R. Tolkien</td>
<td>The Lord of the Rings</td>
<td>22.99</td></tr>
</table></body></html>
You can elaborate on this design pattern as much as you
like. Possible enhancements include:
- Providing element handlers with access
to a stack containing details of their context
- Selecting element handlers based on
conditions other than just the element name
- Using element handlers as part of a
pipeline, by allowing them to fire events into another DocumentHandler.
The advantage of this design pattern is that it avoids a
great deal of if-then-else programming. It removes the need to change the
DocumentHandler to add conditional logic every time a new element type is
introduced. Instead all you need to do is to register another element handler.
Previous Page Next Page...
©1999 Wrox Press Limited, US and UK.
|