lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Multi-word exact keyword case-insensitive search suggestions
Date Fri, 14 Jan 2011 20:31:08 GMT
This might work:

Define your field to use WhitespaceTokenizer and LowerCaseFilterFactory

Use a filter query referencing this field.

If you wanted the words to appear in their exact order, you could just
define
the "pf" field in your dismax.

Best
Erick

On Thu, Jan 13, 2011 at 8:01 PM, Estrada Groups <
estrada.adam.groups@gmail.com> wrote:

> Ahhh...the fun of open source software ;-). Requires a ton of trial and
> error! I found what worked for me and figured it was worth passing it along.
> If you don't mind...when you sort everything out on your end, please post
> results for the rest of us to take a gander at.
>
> Cheers,
> Adam
>
> On Jan 13, 2011, at 9:08 PM, Chamnap Chhorn <chamnapchhorn@gmail.com>
> wrote:
>
> > Thanks for your reply. However, it doesn't work for my case at all. I
> think
> > it's the problem with query parser or something else. It forces me to put
> > double quote to the search query in order to get the results found.
> >
> > <str name="rawquerystring">"sim 010"</str>
> > <str name="querystring">"sim 010"</str>
> > <str name="parsedquery">+DisjunctionMaxQuery((keyphrase:sim 010))
> ()</str>
> > <str name="parsedquery_toString">+(keyphrase:sim 010) ()</str>
> >
> > <str name="rawquerystring">smart mobile</str>
> > <str name="querystring">smart mobile</str>
> > <str name="parsedquery">
> > +((DisjunctionMaxQuery((keyphrase:smart))
> > DisjunctionMaxQuery((keyphrase:mobile)))~2) ()
> > </str>
> > <str name="parsedquery_toString">+(((keyphrase:smart)
> (keyphrase:mobile))~2)
> > ()</str>
> >
> > The intent here is to do a full text search, part of that is to search
> > keyword field, so I can't put quote to it.
> >
> > On Thu, Jan 13, 2011 at 10:30 PM, Adam Estrada <
> > estrada.adam.groups@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> the following seems to work pretty well.
> >>
> >>   <fieldType name="text_ws" class="solr.TextField"
> >> positionIncrementGap="100">
> >>     <analyzer>
> >>       <tokenizer class="solr.KeywordTokenizerFactory" />
> >>       <filter class="solr.ShingleFilterFactory"
> >>         maxShingleSize="4" outputUnigrams="true"
> >> outputUnigramIfNoNgram="false" />
> >>     </analyzer>
> >>   </fieldType>
> >>
> >>   <!-- A text field that uses WordDelimiterFilter to enable splitting
> and
> >> matching of
> >>       words on case-change, alpha numeric boundaries, and
> non-alphanumeric
> >> chars,
> >>       so that a query of "wifi" or "wi fi" could match a document
> >> containing "Wi-Fi".
> >>       Synonyms and stopwords are customized by external files, and
> >> stemming is enabled.
> >>       The attribute autoGeneratePhraseQueries="true" (the default)
> causes
> >> words that get split to
> >>       form phrase queries. For example, WordDelimiterFilter splitting
> >> text:pdp-11 will cause the parser
> >>       to generate text:"pdp 11" rather than (text:PDP OR text:11).
> >>       NOTE: autoGeneratePhraseQueries="true" tends to not work well for
> >> non whitespace delimited languages.
> >>       -->
> >>   <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100"
> >> autoGeneratePhraseQueries="true">
> >>     <analyzer type="index">
> >>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>       <!-- in this example, we will only use synonyms at query time
> >>       <filter class="solr.SynonymFilterFactory"
> >> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
> >>       -->
> >>       <!-- Case insensitive stop word removal.
> >>         add enablePositionIncrements=true in both the index and query
> >>         analyzers to leave a 'gap' for more accurate phrase queries.
> >>       -->
> >>       <filter class="solr.StopFilterFactory"
> >>               ignoreCase="true"
> >>               words="stopwords.txt"
> >>               enablePositionIncrements="true"
> >>               />
> >>       <filter class="solr.WordDelimiterFilterFactory"
> >> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >>       <filter class="solr.LowerCaseFilterFactory"/>
> >>       <filter class="solr.KeywordMarkerFilterFactory"
> >> protected="protwords.txt"/>
> >>       <filter class="solr.PorterStemFilterFactory"/>
> >>     </analyzer>
> >>     <analyzer type="query">
> >>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >>       <filter class="solr.StopFilterFactory"
> >>               ignoreCase="true"
> >>               words="stopwords.txt"
> >>               enablePositionIncrements="true"
> >>               />
> >>       <filter class="solr.WordDelimiterFilterFactory"
> >> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >>       <filter class="solr.LowerCaseFilterFactory"/>
> >>       <filter class="solr.KeywordMarkerFilterFactory"
> >> protected="protwords.txt"/>
> >>       <filter class="solr.PorterStemFilterFactory"/>
> >>     </analyzer>
> >>   </fieldType>
> >>
> >>   <copyField source="cat" dest="text"/>
> >>   <copyField source="subject" dest="text"/>
> >>   <copyField source="summary" dest="text"/>
> >>   <copyField source="cause" dest="text"/>
> >>   <copyField source="status" dest="text"/>
> >>   <copyField source="urgency" dest="text"/>
> >>
> >> I ingest the source fields as text_ws (I know I've changed it a bit) and
> >> then copy the field to text. This seems to do what you are asking for.
> >>
> >> Adam
> >>
> >> On Thu, Jan 13, 2011 at 12:05 AM, Chamnap Chhorn <
> chamnapchhorn@gmail.com
> >>> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I'm just stuck with exact keyword for several days. Hope you guys could
> >>> help
> >>> me. Here is the scenario:
> >>>
> >>>  1. It need to be matched with multi-word keyword and case insensitive
> >>>  2. Partial word or single word matching with this field is not allowed
> >>>
> >>> I want to know the field type definition for this field and sample solr
> >>> query. I need to combine this search with my full text search which
> uses
> >>> dismax query.
> >>>
> >>> Thanks
> >>> --
> >>> Chhorn Chamnap
> >>> http://chamnapchhorn.blogspot.com/
> >>>
> >>
> >
> >
> >
> > --
> > Chhorn Chamnap
> > http://chamnapchhorn.blogspot.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message