de.dbsystems.simplescrape
Class ScrapeOptions

java.lang.Object
  extended by de.dbsystems.simplescrape.ScrapeOptions

public class ScrapeOptions
extends java.lang.Object

Preferences for scraping a file. Since there are many different options on how to process a scraped file, these are grouped together here. This keeps the method signatures lean and allows for using different "sets" of options to be used during scraping.

Since:
05.04.2007
Author:
Ronald Bieber, DB Systems GmbH

Field Summary
 boolean advance
          Specifies, whether the internal current marker is advanced during a search operation.
 boolean attributesStrict
          Specifies, whether the attributes of provided elements are to be treated strict or lenient.
static int ELEMENT_ORDER_COMMENTS_ALLOWED
          Between the provided elements whitspace-tokens and comments may appear.
static int ELEMENT_ORDER_ELEMENTS_ALLOWED
          Any other elements, including tags, may appear inbetween the provided elements.
static int ELEMENT_ORDER_STRICT
          There are no other elements allowed inbetween the provided elements.
static int ELEMENT_ORDER_WHITESPACE_ALLOWED
          Between the provided elements whitspace-tokens may appear.
 int elementOrder
          Specified, how strict the elements to be searched for are to be treated.
 boolean ignoreCase
          Specifies, whether equality checks are to be performed ignoring or respecting case.
 boolean searchForward
          Specified, whether the search is performed forward or backwards.
 boolean trimText
          Specifies, whether checked text is trimmed first.
 
Constructor Summary
ScrapeOptions()
           
 
Method Summary
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ELEMENT_ORDER_STRICT

public static final int ELEMENT_ORDER_STRICT
There are no other elements allowed inbetween the provided elements. This is a possible value for elementOrder.

See Also:
Constant Field Values

ELEMENT_ORDER_WHITESPACE_ALLOWED

public static final int ELEMENT_ORDER_WHITESPACE_ALLOWED
Between the provided elements whitspace-tokens may appear. This includes line-breaks. This is a possible value for elementOrder.

See Also:
Constant Field Values

ELEMENT_ORDER_COMMENTS_ALLOWED

public static final int ELEMENT_ORDER_COMMENTS_ALLOWED
Between the provided elements whitspace-tokens and comments may appear. This includes line-breaks. This is a possible value for elementOrder.

See Also:
Constant Field Values

ELEMENT_ORDER_ELEMENTS_ALLOWED

public static final int ELEMENT_ORDER_ELEMENTS_ALLOWED
Any other elements, including tags, may appear inbetween the provided elements. This is a possible value for elementOrder.

See Also:
Constant Field Values

elementOrder

public int elementOrder
Specified, how strict the elements to be searched for are to be treated. This can range from absolutely strict to very lenient. Possible values are: Default value: ELEMENT_ORDER_COMMENTS_ALLOWED


attributesStrict

public boolean attributesStrict
Specifies, whether the attributes of provided elements are to be treated strict or lenient. Possible values: Default: false


ignoreCase

public boolean ignoreCase
Specifies, whether equality checks are to be performed ignoring or respecting case. This will be used for tags, comments and text in the same way. Default: true


trimText

public boolean trimText
Specifies, whether checked text is trimmed first. If true, surrounding whitespace will be ignored (both in search-tokens and analyzed data. Default: true


advance

public boolean advance
Specifies, whether the internal current marker is advanced during a search operation. Default: true.


searchForward

public boolean searchForward
Specified, whether the search is performed forward or backwards. Please note that a backward search also leads to the provided elements to be processed backwards.

Example (simplied): "a b c d e d c", searching backwards (from the end) for "d e" returns a positive result (with the marker pointing to the second "d" as it is the first token after the found "d e"), while a search for "b a" fails, as these tokens do not appear in that order.

Default: true (forwards).

Constructor Detail

ScrapeOptions

public ScrapeOptions()