lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Tschumy <>
Subject Re: which HTML parser is better?
Date Thu, 03 Feb 2005 01:59:09 GMT
No one has yet mentioned using ParserDelegator and ParserCallback that 
are part of HTMLEditorKit in Swing.  I have been successfully using 
these classes to parse out the text of an HTML file.  You just need to 
extend HTMLEditorKit.ParserCallback and override the various methods 
that are called when different tags are encountered.

On Feb 1, 2005, at 3:14 AM, Jingkang Zhang wrote:

> Three HTML parsers(Lucene web application
> demo,CyberNeko HTML Parser,JTidy) are mentioned in
> Lucene FAQ
> 1.3.27.Which is the best?Can it filter tags that are
> auto-created by MS-word 'Save As HTML files' function?
Bill Tschumy
Otherwise -- Austin, TX

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message