lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: NorwegianLightStemFilterFactory and protected words
Date Fri, 01 Mar 2013 13:56:04 GMT
Hi Remi,

The filter does not support protwords but does support the KeywordAttribute. Use the KeywordMarkerFilter
to mark a list of words and protect them from stemming.

http://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeywordMarkerFilter.html

Cheers,
Markus

 
 
-----Original message-----
> From:Remi Mikalsen <remi.mikalsen@iktsenteret.no>
> Sent: Fri 01-Mar-2013 14:46
> To: solr-user@lucene.apache.org
> Subject: NorwegianLightStemFilterFactory and protected words
> 
> While the NorwegianLightStemFilterFactory generally works very well, I have come across
a few words I'd very much like not to stem.
> 
> The following words:
>  - lærere (teachers)
>  - lærer (teacher)
>  - lære (teach)
> 
> all match :
>  - lær (leather)
> 
> I tried adding protected="protwords.txt" to my NorwegianLightStemFilterFactory filter,
and adding the following words to my protwords.txt file:
>  - lærere
>  - lærer
>  - lære
> 
> It didn't work (I use the protwords.txt for other purposes and it works there). After
looking around, it *seems* this particular FilterFactory doesn't support protwords the same
way for example SnowballPorterFilterFactory does.
> 
> I wonder if there is an alternative way to stop those words from being processed by the
NorwegianLightStemFilterFactory? 
> 
> 
> Regards,
> 
> -- 
> Remi Mikalsen
> Senter for IKT i utdanningen
> 

Mime
View raw message