net.jxta.search.util
Class XmlParser
java.lang.Object
|
+--net.jxta.search.util.XmlParser
- public class XmlParser
- extends java.lang.Object
A very light-weight non-validating xml parser. The call-back mechanism
is flexible enough to be used for constructing a stack of tags or more
specific tasks like link extraction or indexing of both text and hrefs.
Usage:
XmlParser parser = new XmlParser ();
XmlParser.ParserCallback callback = new XmlParser.ParserCallback () {
public void startTag (byte[] chars, int start, int len) {
System.out.println ("Start tag: " +
new String (chars, start, len));
}
public void chars (byte[] chars, int start, int len) {
System.out.println ("Chars: ." +
new String (chars, start, len) +
".");
}
public void endTag (byte[] chars, int start, int len) {
System.out.println ("End tag: " +
new String (chars, start, len));
}
};
byte[] buf = new char[256];
InputStream reader = System.in;
parser.parse (reader, buf, callback);
Method Summary |
static java.lang.String |
getAttributeValue(byte[] attributeName,
byte[] tag,
int start,
int len)
Parse the attribute value of a given attribute name in a tag. |
static int |
getIntAttribute(byte[] attributeName,
byte[] tag,
int start,
int len,
int defaultValue)
|
static void |
main(java.lang.String[] argv)
|
static void |
parse(java.io.InputStream reader,
byte[] chars,
XmlParser.ParserCallback callback)
Parse the input stream as html. |
static boolean |
startsWith(byte[] what,
byte[] chars,
int start,
int len)
Check whether a given character buffer starts with a certain set of characters. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
XmlParser
public XmlParser()
parse
public static void parse(java.io.InputStream reader,
byte[] chars,
XmlParser.ParserCallback callback)
throws java.io.IOException,
XmlParser.Exception
- Parse the input stream as html. Please note that tags longer
than the character buffer's size will be thrown away. If your
buffer is sufficiently large, say 8k, then this obviously
doesn't
- Parameters:
reader
- the input stream (unbuffered, hopefully)
from which to read htmlchars
- the character buffer to use when readingcallback
- the callback interface implementation
getAttributeValue
public static java.lang.String getAttributeValue(byte[] attributeName,
byte[] tag,
int start,
int len)
- Parse the attribute value of a given attribute name in a tag.
- Parameters:
attributeName
- the name of the attributetag
- the tag in which to look for the attribute- Returns:
- the attribute's value or null if the tag doesn't contain that attribute.
getIntAttribute
public static int getIntAttribute(byte[] attributeName,
byte[] tag,
int start,
int len,
int defaultValue)
startsWith
public static boolean startsWith(byte[] what,
byte[] chars,
int start,
int len)
- Check whether a given character buffer starts with a certain set of characters.
- Parameters:
chars
- the character bufferwhat
- the set of characters the buffer might start with.- Returns:
- true if the character buffer starts with what
main
public static void main(java.lang.String[] argv)
throws java.io.IOException,
XmlParser.Exception