lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <>
Subject Re: Providing token variants at index time
Date Thu, 22 Jul 2010 20:01:58 GMT
I think the Synonym filter should actually do exactly what you want, no?

Hmm, maybe not exactly what you want as you describe it. It comes close, 
maybe good enough. Do you REALLY need to support "I Business M" or "I B 
Machines" as source/query? Your spec suggests yes, synonym filter won't 
easily do that.But if you just want "International Business Machines" == 
"IBM", keeping positions intact for subsequent terms, I think synonym 
filter will do it. 

If not, I suppose you could look at it's source to write your own. Or 
maybe there's some way to combine the PositionFilter with something else 
to do it, but I can't figure one out.


Paul Dlug wrote:
> Is there a tokenizer that supports providing variants of the tokens at
> index time? I'm looking for something that could take a syntax like:
> International|I Business|B Machines|M
> Which would take each pipe delimited token and preserve its position
> so that phrase queries work properly. The above would result in
> queries for "International Business Machines" as well as "I B M" or
> any variants. The point is that the variants would be generated
> externally as part of the indexing process so they may not be as
> simple as the above.
> Any ideas or do I have to write a custom tokenizer to do this?
> Thanks,
> Paul

View raw message