Class LooseHtml

  extended by com.caucho.xml.AbstractParser
      extended by com.caucho.xml.XmlParser
          extended by com.caucho.xml.LooseHtml
All Implemented Interfaces:
org.xml.sax.Parser, org.xml.sax.XMLReader

public class LooseHtml
extends XmlParser

A forgiving HTML parser interface.

The forgiving HTML parser is useful for extracting information from the web since many sites have not-quite-standard HTML.

To parse a file into a DOM Document use

 Document doc = new Html().parseDocument("foo.html");

To parse a string into a DOM Document use

 String html = "<h1>small test</h1>";
 Document doc = new Html().parseDocumentString(html);

To parse a file using the SAX API use

 Html html = new Html();

Nested Class Summary
Nested classes/interfaces inherited from class com.caucho.xml.XmlParser
Field Summary
Fields inherited from class com.caucho.xml.XmlParser
Constructor Summary
          Create a new forgiving HTML parser
Method Summary
Methods inherited from class com.caucho.xml.XmlParser
free, getColumnNumber, getFilename, getLine, getLineNumber, getPublicId, getSystemId, pushNamespace, setLine, setReader, unread
Methods inherited from class com.caucho.xml.AbstractParser
getContentHandler, getDefaultEncoding, getDTDHandler, getEncoding, getEntitiesAsText, getEntityResolver, getErrorHandler, getFeature, getForgiving, getJsp, getProperty, getResinInclude, getSearchPath, getSkipComments, isCoalescing, isDtdValidating, isNamespaceAware, isNamespacePrefixes, isSAXNamespaces, isValidating, openSource, openStream, openStream, openStream, openTopStream, parse, parse, parse, parse, parse, parseDocument, parseDocument, parseDocument, parseDocument, parseDocument, parseDocument, parseDocumentString, parseImpl, parseString, setAutodetectXml, setCoalescing, setConfig, setContentHandler, setDefaultEncoding, setDocumentHandler, setDTDHandler, setDtdValidating, setEntitiesAsText, setEntityResolver, setErrorHandler, setExpandEntities, setFeature, setFilename, setForgiving, setJsp, setLexicalHandler, setLocale, setNamespaceAware, setNamespacePrefixes, setOwner, setProperty, setResinInclude, setSAXNamespaces, setSearchPath, setSkipComments, setSkipWhitespace, setToLower, setValidating
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public LooseHtml()
Create a new forgiving HTML parser