lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Fowler" <>
Subject Re: Solr and KStem
Date Tue, 11 Sep 2007 00:33:16 GMT

I would like to test this and have a few questions (please excuse what may
seem naive questions).

I would like to verify that this is purely a configuration feature -- since
the schema.xml defines the analysis/tokerizer chain no other changes are
required.  Also, the source seems to say that a lower case factory needs to
be "farther down" the tokenizer chain.  So does this mean that the KStem
factory appears before the lower case filter factory in the schema.xml.  Is
there a recommended (required?) tokenizer factory.  I am using the
WhiteSpaceFactory which seems OK.  Finally, I take it that I need to remove
the EnglishPorterFilterFactory item in the schema.xml -- or no?



On 9/10/07, Wagner,Harry <> wrote:
> Hi Yonik,
> The modified KStemmer source is attached. The original KStemFilter is
> now wrapped (and replaced) by KStemFilterFactory.  I also changed the
> path to avoid any naming collisions with existing Lucene code.
> I included the jar file also, for anyone who wants to just drop and
> play:
> - put KStem2.jar in your solr/lib directory.
> - change your schema to use: <filter
> class="org.oclc.solr.analysis.KStemFilterFactory" cacheSize="20000"/>
> - restart your app server
> I don't know if you credit contributions, but if so please include OCLC.
> Seems only fair since I did this on their dime :)
> Cheers!
> harry
> -----Original Message-----
> From: [] On Behalf Of Yonik
> Seeley
> Sent: Friday, September 07, 2007 3:59 PM
> To:
> Subject: Re: Solr and KStem
> On 9/7/07, Wagner,Harry <> wrote:
> > I've implemented a Solr plug-in that wraps KStem for Solr use.  KStem
> is
> > considered to be more appropriate for library usage since it is much
> > less aggressive than Porter (i.e., searches for organization do NOT
> > match on organ!). If there is any interest in feeding this back into
> > Solr I would be happy to contribute it.
> Absolutely.
> We need to make sure that the license for that k-stemmer is ASL
> compatible of course.
> -Yonik

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message