lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wechner <michael.wech...@wyona.org>
Subject Re: <no-index> or <index>
Date Fri, 31 Jan 2003 00:07:15 GMT
Kelvin Tan wrote:

>My suggestion would be to modify HTMLParser to do the job. Don't think it's 
>very difficult. I'm unaware of any existing HTML Parsers which support that 
>functionality...
>

Maybe Erik wants to include an "improved" version of my code snippet 
into CVS.

I guess I am not the only one wanting to exclude certain parts from an 
HTML page ;-)

All the best

Michael

>
>
>Regards,
>Kelvin
>
>--------
>The book giving manifesto     - http://how.to/sharethisbook
>
>
>On Thu, 30 Jan 2003 10:56:50 +0100, Michael Wechner said:
>  
>
>>Hi
>>
>>I am looking for an HTMLParser which skips text tagged by
>>
>><no-index>  or something similar. This way I could exclude for
>>instance a "global navigation section" within the HTML
>>
>><no-index> International<br> Business<br> Science<br> ...
>></no-index>
>>
>>It seems that the current demo/HTMLParser
>>(http://lucene.sourceforge.net/cgi-
>>bin/faq/faqmanager.cgi?file=chapter.indexing&toc=faq#q11) is not
>>capable of doing something like that.
>>
>>Any pointers are very welcome.
>>
>>Thanks a lot
>>
>>Michael
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>    
>>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message