lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Question about PatternReplace filter and automatic Synonym generation
Date Tue, 06 Oct 2009 22:32:07 GMT

:  I ll try to explain with an example. Given the term 'it!' in the title, it
: should match both 'it' and 'it!' in the query as an exact match. Currently,
: this is done by using a synonym entry  (and index time SynonymFilter) as
: follows:
: 
:  it! => it, it!
: 
:  Now, the above holds true for all cases where you have a title token of the
: form [aA-zZ]*!. Handling all of those cases requires adding synonyms
: manually for each case which is not easy to manage and does not scale.
: 
:  I am hoping to do the same by using a index time filter that takes in a
: pattern like the PatternReplace filter and adds the newly created token
: instead of replacing the original one. Does this make sense? Am I missing
: something that would break this approach?

something like this would be fairly easy to implement in Lucene, but 
somewhat confusing to try and configure in Solr.  I was going to suggest 
that you use something like...
 <filter class="solr.PatternReplaceFilterFactory"
                pattern="(^.*)\!?$)" replacement="$1 $2" replace="all" />

..and then have a subsequent filter that splits the tokens on the 
whitespace (or any other special character you could use in the 
replacement) ... but aparently we don't have any built in filters that 
will just split tokens on a character/pattern for you.  that would also be 
fairly easy to write if someone wnats to submit a patch.


-Hoss


Mime
View raw message