lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Remi Mikalsen <remi.mikal...@iktsenteret.no>
Subject Re: NorwegianLightStemFilterFactory and protected words
Date Fri, 01 Mar 2013 14:38:46 GMT
Thanks for such a quick response!

I tried out the suggestion, but I'm struggeling with actually making it work:

schema.xml:
 <filter class="org.apache.lucene.analysis.KeywordMarkerFilter" protected="protectedkeywords.txt"
ignoreCase="false"/>

Produces an instantiation error:
 SEVERE: org.apache.solr.common.SolrException: Error instantiating class: 'org.apache.lucene.analysis.KeywordMarkerFilter
 ...
 Caused by: java.lang.InstantiationException: org.apache.lucene.analysis.KeywordMarkerFilter

I'm running Solr 3.6.1, and went looking here for more info:
 http://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/KeywordMarkerFilterFactory.html

The protectedkeywords.txt has one line, is world readable, placed in same dir as protwords.txt
and contains:
lærer

Any ideas on what is wrong?

Regards,
Remi Mikalsen


----- Opprinnelig melding -----
> Hi Remi,
> 
> The filter does not support protwords but does support the
> KeywordAttribute. Use the KeywordMarkerFilter to mark a list of
> words and protect them from stemming.
> 
> http://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeywordMarkerFilter.html
> 
> Cheers,
> Markus
> 
>  
>  
> -----Original message-----
> > From:Remi Mikalsen <remi.mikalsen@iktsenteret.no>
> > Sent: Fri 01-Mar-2013 14:46
> > To: solr-user@lucene.apache.org
> > Subject: NorwegianLightStemFilterFactory and protected words
> > 
> > While the NorwegianLightStemFilterFactory generally works very
> > well, I have come across a few words I'd very much like not to
> > stem.
> > 
> > The following words:
> >  - lærere (teachers)
> >  - lærer (teacher)
> >  - lære (teach)
> > 
> > all match :
> >  - lær (leather)
> > 
> > I tried adding protected="protwords.txt" to my
> > NorwegianLightStemFilterFactory filter, and adding the following
> > words to my protwords.txt file:
> >  - lærere
> >  - lærer
> >  - lære
> > 
> > It didn't work (I use the protwords.txt for other purposes and it
> > works there). After looking around, it *seems* this particular
> > FilterFactory doesn't support protwords the same way for example
> > SnowballPorterFilterFactory does.
> > 
> > I wonder if there is an alternative way to stop those words from
> > being processed by the NorwegianLightStemFilterFactory?
> > 
> > 
> > Regards,
> > 
> > --
> > Remi Mikalsen
> > Senter for IKT i utdanningen
> > 
> 

Mime
View raw message