|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
AbstractHTMLToken | Common superclass for all tokens that can be found in an HTML-file. |
HTMLComment | Class for holding HTML-comments. |
HTMLTag | Represents tags in HTML-files. |
HTMLTagAttributes | Defines a class for parsing and storing the attributes of an HTML tag. |
HTTPHelper | Class for holding HTML-comments. |
RegExTextToken | Basically a TextToken, but whose content is treated as a regular expression. |
ScrapeOptions | Preferences for scraping a file. |
Scraper | Central class for this package. |
TextToken | Represents tokens containing text data in an HTML-file. |
Tokenizer | Split an input stream into HTML tokens. |
XMLHelper | Class for holding HTML-comments. |
The webscraping-package enables the quick programmatic extraction of information from HTML-pages.
The current state is that of a usable alpha version. In that respect, the webscraper is not yet feature complete, but can already be used (at your own risk, of course).
Some examples for usage can be found in the JUnit test-cases. These can be found under /test/.../
It is expected that Simple-Scrape is used in a programmatic way like this:
This project was developed using Eclipse 3.2 and the files .project and .classpath reflect that origin.
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |