com.caucho.xml
Class LooseHtml

java.lang.Object
  extended by com.caucho.xml.AbstractParser
      extended by com.caucho.xml.XmlParser
          extended by com.caucho.xml.LooseHtml
All Implemented Interfaces:
org.xml.sax.Parser, org.xml.sax.XMLReader

public class LooseHtml
extends XmlParser

A forgiving HTML parser interface.

The forgiving HTML parser is useful for extracting information from the web since many sites have not-quite-standard HTML.

To parse a file into a DOM Document use


 Document doc = new Html().parseDocument("foo.html");
 

To parse a string into a DOM Document use


 String html = "<h1>small test</h1>";
 Document doc = new Html().parseDocumentString(html);
 

To parse a file using the SAX API use


 Html html = new Html();
 html.setContentHandler(myContentHandler);
 html.parse("foo.html");
 


Nested Class Summary
 
Nested classes/interfaces inherited from class com.caucho.xml.XmlParser
XmlParser.LocatorImpl
 
Field Summary
 
Fields inherited from class com.caucho.xml.XmlParser
XML, XMLNS
 
Constructor Summary
LooseHtml()
          Create a new forgiving HTML parser
 
Method Summary
 
Methods inherited from class com.caucho.xml.XmlParser
free, getColumnNumber, getFilename, getLine, getLineNumber, getPublicId, getSystemId, pushNamespace, setLine, setReader, unread
 
Methods inherited from class com.caucho.xml.AbstractParser
getContentHandler, getDefaultEncoding, getDTDHandler, getEncoding, getEntitiesAsText, getEntityResolver, getErrorHandler, getFeature, getForgiving, getJsp, getProperty, getResinInclude, getSearchPath, getSkipComments, isCoalescing, isDtdValidating, isNamespaceAware, isNamespacePrefixes, isSAXNamespaces, isValidating, openSource, openStream, openStream, openStream, openTopStream, parse, parse, parse, parse, parse, parseDocument, parseDocument, parseDocument, parseDocument, parseDocument, parseDocument, parseDocumentString, parseImpl, parseString, setAutodetectXml, setCoalescing, setConfig, setContentHandler, setDefaultEncoding, setDocumentHandler, setDTDHandler, setDtdValidating, setEntitiesAsText, setEntityResolver, setErrorHandler, setExpandEntities, setFeature, setFilename, setForgiving, setJsp, setLexicalHandler, setLocale, setNamespaceAware, setNamespacePrefixes, setOwner, setProperty, setResinInclude, setSAXNamespaces, setSearchPath, setSkipComments, setSkipWhitespace, setToLower, setValidating
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LooseHtml

public LooseHtml()
Create a new forgiving HTML parser