public class LooseHtml extends XmlParser
The forgiving HTML parser is useful for extracting information from the web since many sites have not-quite-standard HTML.
To parse a file into a DOM Document use
Document doc = new Html().parseDocument("foo.html");
To parse a string into a DOM Document use
String html = "<h1>small test</h1>";
Document doc = new Html().parseDocumentString(html);
To parse a file using the SAX API use
Html html = new Html();
html.setContentHandler(myContentHandler);
html.parse("foo.html");
XmlParser.LocatorImpl| Constructor and Description |
|---|
LooseHtml()
Create a new forgiving HTML parser
|
free, getColumnNumber, getFilename, getLine, getLineNumber, getPublicId, getSystemId, pushNamespace, setLine, setReader, unreadgetContentHandler, getDefaultEncoding, getDTDHandler, getEncoding, getEntitiesAsText, getEntityResolver, getErrorHandler, getFeature, getForgiving, getJsp, getProperty, getResinInclude, getSearchPath, getSkipComments, isCoalescing, isDtdValidating, isNamespaceAware, isNamespacePrefixes, isSAXNamespaces, isValidating, openSource, openStream, openStream, openStream, openTopStream, parse, parse, parse, parse, parse, parseDocument, parseDocument, parseDocument, parseDocument, parseDocument, parseDocument, parseDocumentString, parseImpl, parseString, setAutodetectXml, setCoalescing, setConfig, setContentHandler, setDefaultEncoding, setDocumentHandler, setDTDHandler, setDtdValidating, setEntitiesAsText, setEntityResolver, setErrorHandler, setExpandEntities, setFeature, setFilename, setForgiving, setJsp, setLexicalHandler, setLocale, setNamespaceAware, setNamespacePrefixes, setOwner, setProperty, setResinInclude, setSAXNamespaces, setSearchPath, setSkipComments, setSkipWhitespace, setToLower, setValidating