lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fraschetti <>
Subject a good html & script parser
Date Sat, 25 Sep 2004 06:23:07 GMT
most of the html parsers I can find on the web handle only the <tag>
syntax and forget about the { code } syntax that usually occurs in a
lot of web pages.

Is there a good library to return the plain text of a html doc string
which will eliminate more than simply the <tag> occurrance?

Chris Fraschetti

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message