de.dbsystems.simplescrape
Class HTMLTag

java.lang.Object
  extended by de.dbsystems.simplescrape.AbstractHTMLToken
      extended by de.dbsystems.simplescrape.HTMLTag

public class HTMLTag
extends AbstractHTMLToken

Represents tags in HTML-files. There will be one object for every opening and closing tag, each (and for unary tags, too). Attributes are stored and parsed seperately and can be accessed through the getAttributes()-method.

Since:
04.04.2007
Author:
Ronald Bieber, DB Systems GmbH

Constructor Summary
HTMLTag(java.lang.String tagContent)
          Create an HTML tag.
 
Method Summary
 boolean attributesMatch(HTMLTag a, HTMLTag b, ScrapeOptions options)
          Test the attribute sets of two tags.
 HTMLTagAttributes getAttributes()
          Returns the attributes of this node.
 java.lang.String getName()
          Returns the name of this tag.
 boolean isEndTag()
          Whether or not the tag is an end tag.
 boolean isUnaryTag()
          Whether or not the tag is a unary tag.
 boolean match(AbstractHTMLToken other, ScrapeOptions options)
          Determines whether two tokens match.
 java.lang.String toString()
          Returns an HTML-representation of this tag.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HTMLTag

public HTMLTag(java.lang.String tagContent)
Create an HTML tag. Pass in the content of the tag, i.e. for the tag "" pass in "body bgcolor=#ffffff".

Method Detail

attributesMatch

public boolean attributesMatch(HTMLTag a,
                               HTMLTag b,
                               ScrapeOptions options)
Test the attribute sets of two tags. Note: The order of a and b is not irrelevant! Depending on the ScrapeOptions provided a may contain more attributes than b without failing the test.

Parameters:
a - The tag under test.
b - The reference to be tested against.
options - Relevant options for this comparison are attributesStrict and ignoreCase.
Returns:
true: The attributes match, false: they don't match.

match

public boolean match(AbstractHTMLToken other,
                     ScrapeOptions options)
Description copied from class: AbstractHTMLToken
Determines whether two tokens match.

Specified by:
match in class AbstractHTMLToken
Parameters:
other - The search-HtmlToken to be tested against.
options - A set of options. Relevant options are attributesStrict, trimText and ignoreCase.
Returns:
true: The two elements match, false: they don't (duh!)

getAttributes

public HTMLTagAttributes getAttributes()
Returns the attributes of this node.

Returns:
The attributes, if there are any, or null otherwise

getName

public java.lang.String getName()
Returns the name of this tag.

Returns:
The name

isEndTag

public boolean isEndTag()
Whether or not the tag is an end tag.


isUnaryTag

public boolean isUnaryTag()
Whether or not the tag is a unary tag. Example for a unary tag:


toString

public java.lang.String toString()
Returns an HTML-representation of this tag. The attributes are returned the way they were originally provided (not normalized).

Overrides:
toString in class java.lang.Object