public class LooseHtml extends XmlParser
The forgiving HTML parser is useful for extracting information from the web since many sites have not-quite-standard HTML.
To parse a file into a DOM Document use
Document doc = new Html().parseDocument("foo.html");
To parse a string into a DOM Document use
String html = "<h1>small test</h1>";
Document doc = new Html().parseDocumentString(html);
To parse a file using the SAX API use
Html html = new Html();
html.setContentHandler(myContentHandler);
html.parse("foo.html");
XmlParser.LocatorImpl
Constructor and Description |
---|
LooseHtml()
Create a new forgiving HTML parser
|
free, getColumnNumber, getFilename, getLine, getLineNumber, getPublicId, getSystemId, pushNamespace, setLine, setReader, unread
getContentHandler, getDefaultEncoding, getDTDHandler, getEncoding, getEntitiesAsText, getEntityResolver, getErrorHandler, getFeature, getForgiving, getJsp, getProperty, getResinInclude, getSearchPath, getSkipComments, isCoalescing, isDtdValidating, isNamespaceAware, isNamespacePrefixes, isSAXNamespaces, isValidating, openSource, openStream, openStream, openStream, openTopStream, parse, parse, parse, parse, parse, parseDocument, parseDocument, parseDocument, parseDocument, parseDocument, parseDocument, parseDocumentString, parseImpl, parseString, setAutodetectXml, setCoalescing, setConfig, setContentHandler, setDefaultEncoding, setDocumentHandler, setDTDHandler, setDtdValidating, setEntitiesAsText, setEntityResolver, setErrorHandler, setExpandEntities, setFeature, setFilename, setForgiving, setJsp, setLexicalHandler, setLocale, setNamespaceAware, setNamespacePrefixes, setOwner, setProperty, setResinInclude, setSAXNamespaces, setSearchPath, setSkipComments, setSkipWhitespace, setToLower, setValidating