lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Webster Homer <webster.ho...@sial.com>
Subject Re: WordDelimiterFilterFactory with Wildcards
Date Wed, 26 Jul 2017 18:33:30 GMT
checked the Pattern Replace it's OK. Can't use the preserve original since
it preserves the hyphens too, which I don't want. It would be best if it
didn't touch the * at all

On Wed, Jul 26, 2017 at 1:30 PM, Saurabh Sethi <saurabh.sethi@sendgrid.com>
wrote:

> My guess is PatternReplaceFilterFactory is most likely the cause.
> Also, based on your query, you might want to set preserveOriginal=1
>
> You can take one filter out at a time and see which one is altering the
> query.
>
> On Wed, Jul 26, 2017 at 11:13 AM, Webster Homer <webster.homer@sial.com>
> wrote:
>
> > 1. KeywordTokenizer - we want to treat the entire field as a single term
> to
> > parse
> > 2. preserveOriginal = "0" Thought about changing this to 1
> > 3. 6.2.2
> >
> > This is the fieldtype
> >     <fieldType name="cas_num_tokenizer" class="solr.TextField"
> > positionIncrementGap="100">
> >       <analyzer type="index">
> >         <tokenizer class="solr.KeywordTokenizerFactory"/>
> >                 <filter class="solr.TrimFilterFactory" />
> >         <filter class="solr.WordDelimiterFilterFactory"
> >                    generateWordParts="0"
> >                    splitOnCaseChange="0"
> >                    splitOnNumerics="1"
> >                    generateNumberParts="0"
> >                    catenateWords="0"
> >                    catenateNumbers="1"
> >                    catenateAll="0"
> >                    preserveOriginal="0"
> >                    stemEnglishPossessive="0"/>
> >       </analyzer>
> >        <analyzer type="query">
> >         <tokenizer class="solr.KeywordTokenizerFactory"/>
> >                 <filter class="solr.TrimFilterFactory" />
> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> >                        ignoreCase="true" expand="true"
> > tokenizerFactory="solr.KeywordTokenizerFactory"/>
> >          <!-- remove non-cas queries and junk from synonyms -->
> >                 <filter class="solr.PatternReplaceFilterFactory"
> > pattern="^.*([^- 0-9*]+).*$" replacement="" replace="all"/>
> >           <filter class="solr.WordDelimiterFilterFactory"
> >                    generateWordParts="0"
> >                    splitOnCaseChange="0"
> >                    splitOnNumerics="1"
> >                    generateNumberParts="0"
> >                    catenateWords="0"
> >                    catenateNumbers="1"
> >                    catenateAll="0"
> >                    preserveOriginal="0"
> >                    stemEnglishPossessive="0"/>
> >       </analyzer>
> >    </fieldType>
> >
> >
> > On Wed, Jul 26, 2017 at 12:56 PM, Saurabh Sethi <
> > saurabh.sethi@sendgrid.com>
> > wrote:
> >
> > > 1. What tokenizer are you using?
> > > 2. Do you have preserveOriginal="1" flag set in your filter?
> > > 3. Which version of solr are you using?
> > >
> > > On Wed, Jul 26, 2017 at 10:48 AM, Webster Homer <
> webster.homer@sial.com>
> > > wrote:
> > >
> > > > I have several fieldtypes that use the WordDelimiterFilterFactory
> > > >
> > > > We have a fieldtype for cas numbers. which look like 1234-12-1,
> numbers
> > > > separated by hyphens, users often leave out the hyphens and either
> use
> > > > spaces or just string the numbers together.
> > > >
> > > > The WDF seemed like a great solution especially as it gave partial
> > > matches.
> > > > However, a query like 1234-12-* fails. The analyzer tool shows the
> > > wildcard
> > > > getting stripped off.
> > > > Is there any way to preserve the wildcard in the query analyzer when
> > > using
> > > > the WordDelimiterFilterFactory?
> > > >
> > > > --
> > > >
> > > >
> > > > This message and any attachment are confidential and may be
> privileged
> > or
> > > > otherwise protected from disclosure. If you are not the intended
> > > recipient,
> > > > you must not copy this message or attachment or disclose the contents
> > to
> > > > any other person. If you have received this transmission in error,
> > please
> > > > notify the sender immediately and delete the message and any
> attachment
> > > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > > subsidiaries do not accept liability for any omissions or errors in
> > this
> > > > message which may arise as a result of E-Mail-transmission or for
> > damages
> > > > resulting from any unauthorized changes of the content of this
> message
> > > and
> > > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > > subsidiaries do not guarantee that this message is free of viruses
> and
> > > does
> > > > not accept liability for any damages caused by any virus transmitted
> > > > therewith.
> > > >
> > > > Click http://www.emdgroup.com/disclaimer to access the German,
> French,
> > > > Spanish and Portuguese versions of this disclaimer.
> > > >
> > >
> > >
> > >
> > > --
> > > Saurabh Sethi
> > > Principal Engineer I | Engineering
> > >
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
> >
>
>
>
> --
> Saurabh Sethi
> Principal Engineer I | Engineering
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message