lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalin Shekhar Mangar <shalinman...@gmail.com>
Subject Re: Question about PatternReplace filter and automatic Synonym generation
Date Mon, 05 Oct 2009 09:46:54 GMT
On Fri, Oct 2, 2009 at 11:31 PM, Prasanna Ranganathan <
pranganathan@netflix.com> wrote:

>
>  Does the PatternReplaceFilter have an option where you can keep the
> original token in addition to the modified token? From what I looked at it
> does not seem to but I want to confirm the same.
>
>
No, it does not.


> Alternatively, is there a filter available which takes in a pattern and
> produces additional forms of the token depending on the pattern? The use
> case I am looking at here is using such a filter to automate synonym
> generation. In our application, quite a few of the synonym file entries
> match a specific pattern and having such a filter would make it easier I
> believe. Pl. do correct me in case I am missing some unwanted side-effect
> with this approach.
>
>
I do not understand this. TokenFilters are used for things like stemming,
replacing patterns, lowercasing, n-gramming etc. The synonym filter inserts
additional tokens (synonyms) from a file for each token.

What exactly are you trying to do with synonyms? I guess you could do
stemming etc with synonyms but why do you want to do that?


> Continuing on that line, what is the performance hit in having additional
> index-time filters as opposed to using a synonym file with more entries?
> How
> does the overhead of using a bigger synonym file as opposed to additional
> filters compare?
>
>
Note that a change in synonym file needs a re-index of the affected
documents. Also, the synonym map is kept in memory.

-- 
Regards,
Shalin Shekhar Mangar.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message