lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kelvin Tan <kelvin-li...@relevanz.com>
Subject Re: <no-index> or <index>
Date Thu, 30 Jan 2003 12:42:07 GMT
My suggestion would be to modify HTMLParser to do the job. Don't think it's 
very difficult. I'm unaware of any existing HTML Parsers which support that 
functionality...


Regards,
Kelvin

--------
The book giving manifesto     - http://how.to/sharethisbook


On Thu, 30 Jan 2003 10:56:50 +0100, Michael Wechner said:
>Hi
>
>I am looking for an HTMLParser which skips text tagged by
>
><no-index>  or something similar. This way I could exclude for
>instance a "global navigation section" within the HTML
>
><no-index> International<br> Business<br> Science<br> ...
></no-index>
>
>It seems that the current demo/HTMLParser
>(http://lucene.sourceforge.net/cgi-
>bin/faq/faqmanager.cgi?file=chapter.indexing&toc=faq#q11) is not
>capable of doing something like that.
>
>Any pointers are very welcome.
>
>Thanks a lot
>
>Michael
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message