lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Dlug <paul.d...@gmail.com>
Subject Re: Providing token variants at index time
Date Thu, 22 Jul 2010 20:22:53 GMT
On Thu, Jul 22, 2010 at 4:01 PM, Jonathan Rochkind <rochkind@jhu.edu> wrote:
> I think the Synonym filter should actually do exactly what you want, no?
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
> Hmm, maybe not exactly what you want as you describe it. It comes close,
> maybe good enough. Do you REALLY need to support "I Business M" or "I B
> Machines" as source/query? Your spec suggests yes, synonym filter won't
> easily do that.But if you just want "International Business Machines" ==
> "IBM", keeping positions intact for subsequent terms, I think synonym filter
> will do it.
> If not, I suppose you could look at it's source to write your own. Or maybe
> there's some way to combine the PositionFilter with something else to do it,
> but I can't figure one out.

The synonym approach won't work as I need to provide them in a file.
The variants may be more dynamic and not known in advance, the process
creating the documents to index does have that logic and could easily
put them into the document in a format a tokenizer could pull apart
later.


--Paul

Mime
View raw message