lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Johnson <jej2...@gmail.com>
Subject Re: Including phonetic search in text field
Date Mon, 23 May 2011 20:42:09 GMT
Paul,

Do you have an example of how to enable this in the solr config on the
default request handler?  Is it as simple as adding


    <str name="defType">edismax</str>
       <str name="qf">
          text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
       </str>

to the requestHandler named search?

On Mon, May 23, 2011 at 4:18 PM, Jamie Johnson <jej2003@gmail.com> wrote:

> Ah, yes very helpful thanks Paul.  I knew there would be something that I
> broke :).  I will need to go back and consider the use cases and see which
> will and will not require exact matches.  Thanks again!
>
>
> I have never heard of DisMax so this is new to me as well but have found
> some posts about it.  I am sure this will generate other questions :)  Again
> thanks.
>
>
> On Mon, May 23, 2011 at 3:56 PM, Paul Libbrecht <paul@hoplahup.net> wrote:
>
>> Jamie,
>>
>> the problem with that is that you cannot do exact matching anymore.
>> For this reason, it is good style to have two fields, to use a query
>> expander such as dismax (prefer exact matches, and less phonetic matches),
>> and to only use that when you sort by score.
>>
>> hope it helps
>>
>> paul
>>
>>
>> Le 23 mai 2011 à 21:43, Jamie Johnson a écrit :
>>
>> > I am new to solr and am trying to determine the best way to take the
>> text
>> > field type (the one in the example) and add phonetic searches to it.
>> > Currently I have done the following:
>> >
>> >    <fieldType name="text" class="solr.TextField"
>> positionIncrementGap="100"
>> > autoGeneratePhraseQueries="true">
>> >      <analyzer type="index">
>> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >        <filter class="solr.DoubleMetaphoneFilterFactory"/>
>> >        <!-- in this example, we will only use synonyms at query time
>> >        <filter class="solr.SynonymFilterFactory"
>> > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>> >        -->
>> >        <!-- Case insensitive stop word removal.
>> >          add enablePositionIncrements=true in both the index and query
>> >          analyzers to leave a 'gap' for more accurate phrase queries.
>> >        -->
>> >        <filter class="solr.StopFilterFactory"
>> >                ignoreCase="true"
>> >                words="stopwords.txt"
>> >                enablePositionIncrements="true"
>> >                />
>> >        <filter class="solr.WordDelimiterFilterFactory"
>> > generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> >        <filter class="solr.LowerCaseFilterFactory"/>
>> >        <filter class="solr.KeywordMarkerFilterFactory"
>> > protected="protwords.txt"/>
>> >        <filter class="solr.PorterStemFilterFactory"/>
>> >
>> >      </analyzer>
>> >      <analyzer type="query">
>> >        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> >        <filter class="solr.DoubleMetaphoneFilterFactory"/>
>> >        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> > ignoreCase="true" expand="true"/>
>> >        <filter class="solr.StopFilterFactory"
>> >                ignoreCase="true"
>> >                words="stopwords.txt"
>> >                enablePositionIncrements="true"
>> >                />
>> >        <filter class="solr.WordDelimiterFilterFactory"
>> > generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> >        <filter class="solr.LowerCaseFilterFactory"/>
>> >        <filter class="solr.KeywordMarkerFilterFactory"
>> > protected="protwords.txt"/>
>> >        <filter class="solr.PorterStemFilterFactory"/>
>> >      </analyzer>
>> >    </fieldType>
>> >
>> > which seems to work.  Is this appropriate or is there a better way of
>> doing
>> > this?  I had previously defined a custom phonetic field but that would
>> mean
>> > for each field which I wanted to support a phonetic style search I would
>> > need to add an additional field.  Adding it to the text seemed much more
>> > elegant since it would work for all text fields.  Is there a reason not
>> to
>> > do this (i.e. performance, index size, etc)?  Any insight/guidance would
>> be
>> > greatly appreciated.
>> >
>> > Also if anyone could point me to what exactly filters do (docs) I would
>> > appreciate it.  My assumption is that they inject additional tokens
>> based on
>> > the specific filter class.  Am I correct?
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message