lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Html documents parsing
Date Tue, 20 Nov 2001 00:33:25 GMT
This is a question similar to the Meta-Tags question posted to the list
earlier today (or was it yesterday?).  Lucene distribution includes a
few simple applications that demonstrate what Lucene is capable of and
how it can be used.  But those demos are not relaly a part of Lucene
code.
It is up to you to write the application around Lucene, which in your
case would include HTML parsing.

Perhaps JTidy (http://jtidy.sf.net/) could come handy here...

Otis


--- Emmanuel Bridonneau <EBridonneau@epicentric.com> wrote:
> I am confused about how Lucene performs the parsing of an Html
> document. It
> doesn't do any tag striping (or does it?) consequently does that mean
> it
> also indexes all html tags? If so then a request for searching "body"
> will
> return any and all html documents previously indexed.
> I'd appreciate anyone would could shed some light on the FAQ.10 about
> indexing?
> 
> 
> --
> To unsubscribe, e-mail:  
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> 


__________________________________________________
Do You Yahoo!?
Find the one for you at Yahoo! Personals
http://personals.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message