de.dbsystems.simplescrape
Class Tokenizer

java.lang.Object
  extended by de.dbsystems.simplescrape.Tokenizer

public class Tokenizer
extends java.lang.Object

Split an input stream into HTML tokens. These tokens can be tags, comments and text tokens.

Since:
03.04.2007
Author:
Ronald Bieber, DB Systems GmbH

Constructor Summary
Tokenizer(java.io.InputStream in)
          Parse an input stream.
Tokenizer(java.lang.String text)
          Convenience method for parsing a string.
 
Method Summary
 AbstractHTMLToken readElement()
          Read the next HTML token from the input stream.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Tokenizer

public Tokenizer(java.lang.String text)
          throws java.io.IOException
Convenience method for parsing a string.

Throws:
java.io.IOException

Tokenizer

public Tokenizer(java.io.InputStream in)
          throws java.io.IOException
Parse an input stream.

Throws:
java.io.IOException
Method Detail

readElement

public AbstractHTMLToken readElement()
                              throws java.io.IOException
Read the next HTML token from the input stream. To determine what kind of element this is (text, tag or comment), use the instanceof-operator.

Returns:
The next HTML token, or null, if the end has been reached.
Throws:
java.io.IOException