git-svn-id: https://svn.apache.org/repos/asf/xml/commons/trunk@225904 13f79535-47bb-0310-9956-ffa450edef68
503 lines
14 KiB
HTML
503 lines
14 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<title>SAX2: Quick Start</title>
|
|
<link rel="stylesheet" type="text/css" href="sax-style.css" />
|
|
</head>
|
|
|
|
<body>
|
|
<h1>SAX2: Quick Start</h1>
|
|
|
|
<blockquote>
|
|
<p class="copyright">This document is in the Public Domain.</p>
|
|
</blockquote>
|
|
|
|
<p>This document provides a quick-start tutorial for Java programmers
|
|
who wish to use SAX2 in their programs.</p>
|
|
|
|
|
|
<div>
|
|
<h2>Requirements</h2>
|
|
|
|
<p>SAX is not an XML parser.</p>
|
|
|
|
<p>SAX is a common interface implemented for many different XML
|
|
parsers (and things that pose as XML parsers), just as the JDBC is a
|
|
common interface implemented for many different relational databases
|
|
(and things that pose as relational databases). If you want to use
|
|
SAX, you'll need all of the following:</p>
|
|
|
|
<ul>
|
|
|
|
<li><p>Java 1.1 or higher.</p></li>
|
|
|
|
<li><p>The SAX2 distribution installed on your Java
|
|
classpath.</p></li>
|
|
|
|
<li><p>A SAX2-compatible XML parser installed on your Java
|
|
classpath.</p></li>
|
|
|
|
<li><p>The name of the SAX2 driver class in your XML parser (read the
|
|
documentation that came with the parser, and <em>write down the class
|
|
name</em>).</p></li>
|
|
|
|
</ul>
|
|
|
|
<!-- end of "Requirements" -->
|
|
</div>
|
|
|
|
|
|
<div>
|
|
<h2>Parsing a document</h2>
|
|
|
|
<p>Start by creating a class that extends <a
|
|
href="javadoc/org/xml/sax/helpers/DefaultHandler.html"
|
|
>DefaultHandler</a>:</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
import org.xml.sax.helpers.DefaultHandler;
|
|
|
|
public class MySAXApp extends DefaultHandler
|
|
{
|
|
|
|
public MySAXApp ()
|
|
{
|
|
super();
|
|
}
|
|
|
|
}
|
|
</pre></blockquote>
|
|
|
|
<p>Now, let's assume that the SAX driver for your XML parser is named
|
|
"com.acme.xml.SAXDriver" (this does not really exist: you
|
|
<em>must</em> find out the name of the real driver for your parser).
|
|
Since this is a Java application, we'll create a static
|
|
<var>main</var> method that creates a new instance of this driver
|
|
(note the "throws Exception" wimp-out):</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
public static void main (String args[])
|
|
throws Exception
|
|
{
|
|
XMLReader xr = new com.acme.xml.SAXDriver();
|
|
}
|
|
</pre></blockquote>
|
|
|
|
<p>Alternatively, if you don't want to tie your application to a
|
|
specific SAX driver, you can use the <a
|
|
href="javadoc/org/xml/sax/helpers/XMLReaderFactory.html#createXMLReader()">createXMLReader</a>
|
|
method from the <a
|
|
href="javadoc/org/xml/sax/helpers/XMLReaderFactory.html"
|
|
>XMLReaderFactory</a> class to choose a SAX driver dynamically:</p>
|
|
|
|
<blockquote><pre>
|
|
public static void main (String args[])
|
|
throws Exception
|
|
{
|
|
XMLReader xr = XMLReaderFactory.createXMLReader();
|
|
}
|
|
</pre></blockquote>
|
|
|
|
<p>In this case, it will be necessary at runtime to set the
|
|
org.xml.sax.driver Java system property to the full classname of the
|
|
SAX driver, as in</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
java -Dorg.xml.sax.driver=com.acme.xml.SAXDriver MySAXApp sample.xml
|
|
</pre></blockquote>
|
|
|
|
<p>We can use this object to parse XML documents, but first, we have
|
|
to register event handlers that the parser can use for reporting
|
|
information, using the <a
|
|
href="javadoc/org/xml/sax/XMLReader.html#setContentHandler(org.xml.sax.ContentHandler)"
|
|
>setContentHandler</a> and <a
|
|
href="javadoc/org/xml/sax/XMLReader.html#setErrorHandler(org.xml.sax.ErrorHandler)"
|
|
>setErrorHandler</a> methods from the <a
|
|
href="javadoc/org/xml/sax/XMLReader.html">XMLReader</a> interface. In
|
|
a real-world application, the handlers will usually be separate
|
|
objects, but for this simple demo, we've bundled the handlers into the
|
|
top-level class, so we just have to instantiate the class and register
|
|
it with the XML reader:</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
public static void main (String args[])
|
|
throws Exception
|
|
{
|
|
XMLReader xr = XMLReaderFactory.createXMLReader();
|
|
MySAXApp handler = new MySAXApp();
|
|
xr.setContentHandler(handler);
|
|
xr.setErrorHandler(handler);
|
|
}
|
|
</pre></blockquote>
|
|
|
|
<p>This code creates an instance of MySAXApp to receive XML parsing
|
|
events, and registers it with the XML reader for regular content
|
|
events and error events (there are other kinds, but they're rarely
|
|
used). Now, let's assume that all of the command-line args are file
|
|
names, and we'll try to parse them one-by-one using the <a
|
|
href="javadoc/org/xml/sax/XMLReader.html#parse(org.xml.sax.InputSource)"
|
|
>parse</a> method from the <a
|
|
href="javadoc/org/xml/sax/XMLReader.html"
|
|
>XMLReader</a> interface:</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
public static void main (String args[])
|
|
throws Exception
|
|
{
|
|
XMLReader xr = XMLReaderFactory.createXMLReader();
|
|
MySAXApp handler = new MySAXApp();
|
|
xr.setContentHandler(handler);
|
|
xr.setErrorHandler(handler);
|
|
|
|
// Parse each file provided on the
|
|
// command line.
|
|
for (int i = 0; i < args.length; i++) {
|
|
FileReader r = new FileReader(args[i]);
|
|
xr.parse(new InputSource(r));
|
|
}
|
|
}
|
|
</pre></blockquote>
|
|
|
|
<p>Note that each reader must be wrapped in an <a
|
|
href="javadoc/org/xml/sax/InputSource.html"
|
|
>InputSource</a> object to be parsed. Here's the whole demo class
|
|
together (so far):</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
import java.io.FileReader;
|
|
|
|
import org.xml.sax.XMLReader;
|
|
import org.xml.sax.InputSource;
|
|
import org.xml.sax.helpers.XMLReaderFactory;
|
|
import org.xml.sax.helpers.DefaultHandler;
|
|
|
|
|
|
public class MySAXApp extends DefaultHandler
|
|
{
|
|
|
|
public static void main (String args[])
|
|
throws Exception
|
|
{
|
|
XMLReader xr = XMLReaderFactory.createXMLReader();
|
|
MySAXApp handler = new MySAXApp();
|
|
xr.setContentHandler(handler);
|
|
xr.setErrorHandler(handler);
|
|
|
|
// Parse each file provided on the
|
|
// command line.
|
|
for (int i = 0; i < args.length; i++) {
|
|
FileReader r = new FileReader(args[i]);
|
|
xr.parse(new InputSource(r));
|
|
}
|
|
}
|
|
|
|
|
|
public MySAXApp ()
|
|
{
|
|
super();
|
|
}
|
|
}
|
|
</pre></blockquote>
|
|
|
|
<p>You can compile this code and run it (make sure you specify the SAX
|
|
driver class in the org.xml.sax.driver property), but nothing much
|
|
will happen unless the document contains malformed XML, because you
|
|
have not yet set up your application to handle SAX events.</p>
|
|
|
|
<!-- end of "Parsing a document" -->
|
|
</div>
|
|
|
|
|
|
<div>
|
|
<h2>Handling events</h2>
|
|
|
|
<p>Things get interesting when you start implementing methods to
|
|
respond to XML parsing events (remember that we registered our class
|
|
to receive XML parsing events in the previous section). The most
|
|
important events are the start and end of the document, the start and
|
|
end of elements, and character data.</p>
|
|
|
|
<p>To find out about the start and end of the document, the client
|
|
application implements the <a
|
|
href="javadoc/org/xml/sax/ContentHandler.html#startDocument()"
|
|
>startDocument</a> and <a
|
|
href="javadoc/org/xml/sax/ContentHandler.html#endDocument()"
|
|
>endDocument</a> methods:</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
public void startDocument ()
|
|
{
|
|
System.out.println("Start document");
|
|
}
|
|
|
|
public void endDocument ()
|
|
{
|
|
System.out.println("End document");
|
|
}
|
|
</pre></blockquote>
|
|
|
|
<p>The start/endDocument event handlers take no arguments. When the
|
|
SAX driver finds the beginning of the document, it will invoke the
|
|
<var>startDocument</var> method once; when it finds the end, it will
|
|
invoke the <var>endDocument</var> method once (if there have been
|
|
errors, endDocument may not be invoked).</p>
|
|
|
|
<p>These examples simply print a message to standard output, but your
|
|
application can contain any arbitrary code in these handlers: most
|
|
commonly, the code will build some kind of an in-memory tree, produce
|
|
output, populate a database, or extract information from the XML
|
|
stream.</p>
|
|
|
|
<p>The SAX driver will signal the start and end of elements in much
|
|
the same way, except that it will also pass some parameters to the <a
|
|
href="javadoc/org/xml/sax/ContentHandler.html#startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes)">startElement</a> and <a
|
|
href="javadoc/org/xml/sax/ContentHandler.html#endElement(java.lang.String, java.lang.String, java.lang.String)">endElement</a> methods:</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
public void startElement (String uri, String name,
|
|
String qName, Attributes atts)
|
|
{
|
|
System.out.println("Start element: {" + uri + "}" + name);
|
|
}
|
|
|
|
public void endElement (String uri, String name, String qName)
|
|
{
|
|
System.out.println("End element: {" + uri + "}" + name);
|
|
}
|
|
</pre></blockquote>
|
|
|
|
<p>These methods print a message every time an element starts or ends,
|
|
with the Namespace URI in braces before the element's local name. The
|
|
<var>qName</var> contains the raw XML 1.0 name, and since not all
|
|
SAX drivers will report it, it is best to ignore that parameter for
|
|
now.</p>
|
|
|
|
<p>Finally, SAX2 reports regular character data through the <a
|
|
href="javadoc/org/xml/sax/ContentHandler.html#characters(char[], int, int)"
|
|
>characters</a> method; the following implementation will print all
|
|
character data to the screen; it is a little longer because it
|
|
pretty-prints the output by escaping special characters:</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
public void characters (char ch[], int start, int length)
|
|
{
|
|
System.out.print("Characters: \"");
|
|
for (int i = start; i < start + length; i++) {
|
|
switch (ch[i]) {
|
|
case '\\':
|
|
System.out.print("\\\\");
|
|
break;
|
|
case '"':
|
|
System.out.print("\\\"");
|
|
break;
|
|
case '\n':
|
|
System.out.print("\\n");
|
|
break;
|
|
case '\r':
|
|
System.out.print("\\r");
|
|
break;
|
|
case '\t':
|
|
System.out.print("\\t");
|
|
break;
|
|
default:
|
|
System.out.print(ch[i]);
|
|
break;
|
|
}
|
|
}
|
|
System.out.print("\"\n");
|
|
}
|
|
</pre></blockquote>
|
|
|
|
<p>Note that a SAX driver is free to chunk the character data any way
|
|
it wants, so you cannot count on all of the character data content of
|
|
an element arriving in a single <var>characters</var> event.</p>
|
|
|
|
<!-- end of "Handling events" -->
|
|
</div>
|
|
|
|
|
|
<div>
|
|
<h2>Sample SAX2 application</h2>
|
|
|
|
<p>Here is the complete sample application (again, in a serious app
|
|
the event handlers would probably be implemented in a separate
|
|
class):</p>
|
|
|
|
<blockquote><pre>
|
|
import java.io.FileReader;
|
|
|
|
import org.xml.sax.XMLReader;
|
|
import org.xml.sax.Attributes;
|
|
import org.xml.sax.InputSource;
|
|
import org.xml.sax.helpers.XMLReaderFactory;
|
|
import org.xml.sax.helpers.DefaultHandler;
|
|
|
|
|
|
public class MySAXApp extends DefaultHandler
|
|
{
|
|
|
|
public static void main (String args[])
|
|
throws Exception
|
|
{
|
|
XMLReader xr = XMLReaderFactory.createXMLReader();
|
|
MySAXApp handler = new MySAXApp();
|
|
xr.setContentHandler(handler);
|
|
xr.setErrorHandler(handler);
|
|
|
|
// Parse each file provided on the
|
|
// command line.
|
|
for (int i = 0; i < args.length; i++) {
|
|
FileReader r = new FileReader(args[i]);
|
|
xr.parse(new InputSource(r));
|
|
}
|
|
}
|
|
|
|
|
|
public MySAXApp ()
|
|
{
|
|
super();
|
|
}
|
|
|
|
|
|
////////////////////////////////////////////////////////////////////
|
|
// Event handlers.
|
|
////////////////////////////////////////////////////////////////////
|
|
|
|
|
|
public void startDocument ()
|
|
{
|
|
System.out.println("Start document");
|
|
}
|
|
|
|
|
|
public void endDocument ()
|
|
{
|
|
System.out.println("End document");
|
|
}
|
|
|
|
|
|
public void startElement (String uri, String name,
|
|
String qName, Attributes atts)
|
|
{
|
|
System.out.println("Start element: {" + uri + "}" + name);
|
|
}
|
|
|
|
|
|
public void endElement (String uri, String name, String qName)
|
|
{
|
|
System.out.println("End element: {" + uri + "}" + name);
|
|
}
|
|
|
|
|
|
public void characters (char ch[], int start, int length)
|
|
{
|
|
System.out.print("Characters: \"");
|
|
for (int i = start; i < start + length; i++) {
|
|
switch (ch[i]) {
|
|
case '\\':
|
|
System.out.print("\\\\");
|
|
break;
|
|
case '"':
|
|
System.out.print("\\\"");
|
|
break;
|
|
case '\n':
|
|
System.out.print("\\n");
|
|
break;
|
|
case '\r':
|
|
System.out.print("\\r");
|
|
break;
|
|
case '\t':
|
|
System.out.print("\\t");
|
|
break;
|
|
default:
|
|
System.out.print(ch[i]);
|
|
break;
|
|
}
|
|
}
|
|
System.out.print("\"\n");
|
|
}
|
|
|
|
}
|
|
</pre></blockquote>
|
|
|
|
<!-- end of "Sample SAX2 application" -->
|
|
</div>
|
|
|
|
|
|
<div>
|
|
<h2>Sample Output</h2>
|
|
|
|
<p>Consider the following XML document:</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
<?xml version="1.0"?>
|
|
|
|
<poem xmlns="http://www.megginson.com/ns/exp/poetry">
|
|
<title>Roses are Red</title>
|
|
<l>Roses are red,</l>
|
|
<l>Violets are blue;</l>
|
|
<l>Sugar is sweet,</l>
|
|
<l>And I love you.</l>
|
|
</poem>
|
|
</pre></blockquote>
|
|
|
|
<p>If this document is named <code>roses.xml</code> and there is a
|
|
SAX2 driver on your classpath named
|
|
<code>com.acme.xml.SAXDriver</code> (this driver does not actually
|
|
exist), you can invoke the sample application like this:</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
java -Dcom.acme.xml.SAXDriver MySAXApp roses.xml
|
|
</pre></blockquote>
|
|
|
|
<p>When you run this, you'll get output something like this:</p>
|
|
|
|
<blockquote><pre xml:space="preserve">
|
|
Start document
|
|
Start element: {http://www.megginson.com/ns/exp/poetry}poem
|
|
Characters: "\n"
|
|
Start element: {http://www.megginson.com/ns/exp/poetry}title
|
|
Characters: "Roses are Red"
|
|
End element: {http://www.megginson.com/ns/exp/poetry}title
|
|
Characters: "\n"
|
|
Start element: {http://www.megginson.com/ns/exp/poetry}l
|
|
Characters: "Roses are red,"
|
|
End element: {http://www.megginson.com/ns/exp/poetry}l
|
|
Characters: "\n"
|
|
Start element: {http://www.megginson.com/ns/exp/poetry}l
|
|
Characters: "Violets are blue;"
|
|
End element: {http://www.megginson.com/ns/exp/poetry}l
|
|
Characters: "\n"
|
|
Start element: {http://www.megginson.com/ns/exp/poetry}l
|
|
Characters: "Sugar is sweet,"
|
|
End element: {http://www.megginson.com/ns/exp/poetry}l
|
|
Characters: "\n"
|
|
Start element: {http://www.megginson.com/ns/exp/poetry}l
|
|
Characters: "And I love you."
|
|
End element: {http://www.megginson.com/ns/exp/poetry}l
|
|
Characters: "\n"
|
|
End element: {http://www.megginson.com/ns/exp/poetry}poem
|
|
End document
|
|
</pre></blockquote>
|
|
|
|
<p>Note that even a short document generates (at least) 25 events: one
|
|
for the start and end of each of the six elements used (or, if you
|
|
prefer, one for each start tag and one for each end tag), one of each
|
|
of the eleven chunks of character data (including whitespace between
|
|
elements), one for the start of the document, and one for the end.</p>
|
|
|
|
<!-- end of "Sample Output" -->
|
|
</div>
|
|
|
|
|
|
<hr />
|
|
|
|
<address>$Id$</address>
|
|
|
|
</body>
|
|
|
|
</html>
|
|
|