lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay B," <vybe3...@gmail.com>
Subject Re: How to use the StandardTokenizer with currency
Date Tue, 06 Dec 2016 21:15:37 GMT
Yes, that works (apart from the typo in PatternReplaceCharFilterFactory)

Here is my config

<!-- VB - Just like text_general, but supports $ currency matching and
autoGeneratePhraseQueries -->
<fieldType name="text_curr_3" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
  <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping.txt"/>
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\$"
replacement="xxdollarxx"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="xxdollarxx"
replacement="\$" replace="all"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" types="word-delim-types.txt" />
    <filter class="solr.LowerCaseFilterFactory"/>
 </analyzer>
  <analyzer type="query">
    <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping.txt"/>
    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\$"
replacement="xxdollarxx"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern="xxdollarxx"
replacement="\$" replace="all"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"  types="word-delim-types.txt" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

On Wed, Nov 30, 2016 at 2:08 PM, Steve Rowe <sarowe@gmail.com> wrote:

> Hi Vinay,
>
> You should be able to use a char filter to convert “$” characters into
> something that will survive tokenization, and then a token filter to
> convert it back.
>
> Something like this (untested):
>
>   <analyzer>
>     <charFilter class=“solr.PatternReplaceCharFiterFactory”
>                 pattern=“\$”
>                 replacement=“__dollar__”/>
>     <tokenizer class=“solr.StandardTokenizerFactory”/>
>     <filter class="solr.PatternReplaceFilterFactory”
>             pattern=“__dollar__”
>             replacement=“\$”
>             replace=“all”/>
>   </analyzer>
>
> --
> Steve
> www.lucidworks.com
>
> > On Nov 30, 2016, at 1:58 PM, Vinay B, <vybe3142@gmail.com> wrote:
> >
> > Prior discussion at
> > http://stackoverflow.com/questions/40877567/using-
> standardtokenizerfactory-with-currency
> >
> > I'd like to maintain other aspects of the StandardTokenizer functionality
> > but I'm wondering if to do what I want, the task boils down to be able to
> > instruct the StandardTokenizer not to discard the $ symbol ? Or is there
> > another way? I'm hoping that this is possible with configuration, rather
> > than code changes.
> >
> > Thanks
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message