lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Hill <>
Subject RE: Stemming - limited index expansion
Date Tue, 12 Jun 2012 23:43:36 GMT
Thanks for the reply.

> -----Original Message-----
> From: Jack Krupansky []
> Sent: Tuesday, June 12, 2012 1:14 PM
> To:
> Subject: Re: Stemming - limited index expansion
> I don't completely follow precisely what you want to do, but the WordDelimiterFilter
is an example of a
> token filter that outputs an extra token at the same position, such as with its

Thanks for directing me to that. I'm currently using 3.4., it doesn't appear in the code base
of 3.6.   
If it doesn't show up until 4.0+ (your link is actually 5.0!), I  know that
   " Terms are no longer required to be character based. Lucene views a term as an arbitrary
But hopefully it at the right level to suggest how would be done using the old CharRef instead
of whatever the new stuff uses (ByteRef?).
I'll take a look.

> Maybe you simple want to internally call some existing stemmer filter and output both
the original and
> stemmed term at the same location?

Yes, that is very close to what I want to do, possibly only with the addition of only doing
stemming on a limited set of all words (but more than just plurals).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message