lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chamnap Chhorn <chamnapchh...@gmail.com>
Subject Re: Multi-word exact keyword case-insensitive search suggestions
Date Sat, 15 Jan 2011 03:01:39 GMT
Ahh, thanks guys for helping me!

For Adam solution, it doesn't work for me. Here is my Field, FieldType, and
solr query:

<fieldType name="text_keyword" class="solr.TextField"
positionIncrementGap="100">
       <analyzer>
       <tokenizer class="solr.KeywordTokenizerFactory" />
       <filter class="solr.ShingleFilterFactory"
         maxShingleSize="4" outputUnigrams="true"
outputUnigramIfNoNgram="false" />
     </analyzer>
</fieldType>

<field name="keyphrase" type="text_keyword" indexed="true" stored="false"
multiValued="true"/>

http://localhost:8081/solr/select?q=printing%20house&qf=keyphrase&debugQuery=on&defType=dismax

<str name="parsedquery">
+((DisjunctionMaxQuery((keyphrase:smart))
DisjunctionMaxQuery((keyphrase:mobile)))~2) ()
</str>
<str name="parsedquery_toString">+(((keyphrase:smart) (keyphrase:mobile))~2)
()</str>
<lst name="explain"/>

The result is not found.

For erick solution, it works for me. However, I can't put filter query,
since it's part of full text search. If I put fq, it would just return
documents that match exactly as the query. I want to show those that match
exactly on the top and the rest for documents that match partially.

The problem is that when the user search a word (eg. "printing" of the
keyword "printing house"), that document also include in the search results.
The other problem is that if the user search the reverse order(eg. "house
printing"), it's also found.

Cheers

On Sat, Jan 15, 2011 at 3:31 AM, Erick Erickson <erickerickson@gmail.com>wrote:

> This might work:
>
> Define your field to use WhitespaceTokenizer and LowerCaseFilterFactory
>
> Use a filter query referencing this field.
>
> If you wanted the words to appear in their exact order, you could just
> define
> the "pf" field in your dismax.
>
> Best
> Erick
>
> On Thu, Jan 13, 2011 at 8:01 PM, Estrada Groups <
> estrada.adam.groups@gmail.com> wrote:
>
> > Ahhh...the fun of open source software ;-). Requires a ton of trial and
> > error! I found what worked for me and figured it was worth passing it
> along.
> > If you don't mind...when you sort everything out on your end, please post
> > results for the rest of us to take a gander at.
> >
> > Cheers,
> > Adam
> >
> > On Jan 13, 2011, at 9:08 PM, Chamnap Chhorn <chamnapchhorn@gmail.com>
> > wrote:
> >
> > > Thanks for your reply. However, it doesn't work for my case at all. I
> > think
> > > it's the problem with query parser or something else. It forces me to
> put
> > > double quote to the search query in order to get the results found.
> > >
> > > <str name="rawquerystring">"sim 010"</str>
> > > <str name="querystring">"sim 010"</str>
> > > <str name="parsedquery">+DisjunctionMaxQuery((keyphrase:sim 010))
> > ()</str>
> > > <str name="parsedquery_toString">+(keyphrase:sim 010) ()</str>
> > >
> > > <str name="rawquerystring">smart mobile</str>
> > > <str name="querystring">smart mobile</str>
> > > <str name="parsedquery">
> > > +((DisjunctionMaxQuery((keyphrase:smart))
> > > DisjunctionMaxQuery((keyphrase:mobile)))~2) ()
> > > </str>
> > > <str name="parsedquery_toString">+(((keyphrase:smart)
> > (keyphrase:mobile))~2)
> > > ()</str>
> > >
> > > The intent here is to do a full text search, part of that is to search
> > > keyword field, so I can't put quote to it.
> > >
> > > On Thu, Jan 13, 2011 at 10:30 PM, Adam Estrada <
> > > estrada.adam.groups@gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> the following seems to work pretty well.
> > >>
> > >>   <fieldType name="text_ws" class="solr.TextField"
> > >> positionIncrementGap="100">
> > >>     <analyzer>
> > >>       <tokenizer class="solr.KeywordTokenizerFactory" />
> > >>       <filter class="solr.ShingleFilterFactory"
> > >>         maxShingleSize="4" outputUnigrams="true"
> > >> outputUnigramIfNoNgram="false" />
> > >>     </analyzer>
> > >>   </fieldType>
> > >>
> > >>   <!-- A text field that uses WordDelimiterFilter to enable splitting
> > and
> > >> matching of
> > >>       words on case-change, alpha numeric boundaries, and
> > non-alphanumeric
> > >> chars,
> > >>       so that a query of "wifi" or "wi fi" could match a document
> > >> containing "Wi-Fi".
> > >>       Synonyms and stopwords are customized by external files, and
> > >> stemming is enabled.
> > >>       The attribute autoGeneratePhraseQueries="true" (the default)
> > causes
> > >> words that get split to
> > >>       form phrase queries. For example, WordDelimiterFilter splitting
> > >> text:pdp-11 will cause the parser
> > >>       to generate text:"pdp 11" rather than (text:PDP OR text:11).
> > >>       NOTE: autoGeneratePhraseQueries="true" tends to not work well
> for
> > >> non whitespace delimited languages.
> > >>       -->
> > >>   <fieldType name="text" class="solr.TextField"
> > positionIncrementGap="100"
> > >> autoGeneratePhraseQueries="true">
> > >>     <analyzer type="index">
> > >>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >>       <!-- in this example, we will only use synonyms at query time
> > >>       <filter class="solr.SynonymFilterFactory"
> > >> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
> > >>       -->
> > >>       <!-- Case insensitive stop word removal.
> > >>         add enablePositionIncrements=true in both the index and query
> > >>         analyzers to leave a 'gap' for more accurate phrase queries.
> > >>       -->
> > >>       <filter class="solr.StopFilterFactory"
> > >>               ignoreCase="true"
> > >>               words="stopwords.txt"
> > >>               enablePositionIncrements="true"
> > >>               />
> > >>       <filter class="solr.WordDelimiterFilterFactory"
> > >> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > >>       <filter class="solr.LowerCaseFilterFactory"/>
> > >>       <filter class="solr.KeywordMarkerFilterFactory"
> > >> protected="protwords.txt"/>
> > >>       <filter class="solr.PorterStemFilterFactory"/>
> > >>     </analyzer>
> > >>     <analyzer type="query">
> > >>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > >>       <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt"
> > >> ignoreCase="true" expand="true"/>
> > >>       <filter class="solr.StopFilterFactory"
> > >>               ignoreCase="true"
> > >>               words="stopwords.txt"
> > >>               enablePositionIncrements="true"
> > >>               />
> > >>       <filter class="solr.WordDelimiterFilterFactory"
> > >> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> > >>       <filter class="solr.LowerCaseFilterFactory"/>
> > >>       <filter class="solr.KeywordMarkerFilterFactory"
> > >> protected="protwords.txt"/>
> > >>       <filter class="solr.PorterStemFilterFactory"/>
> > >>     </analyzer>
> > >>   </fieldType>
> > >>
> > >>   <copyField source="cat" dest="text"/>
> > >>   <copyField source="subject" dest="text"/>
> > >>   <copyField source="summary" dest="text"/>
> > >>   <copyField source="cause" dest="text"/>
> > >>   <copyField source="status" dest="text"/>
> > >>   <copyField source="urgency" dest="text"/>
> > >>
> > >> I ingest the source fields as text_ws (I know I've changed it a bit)
> and
> > >> then copy the field to text. This seems to do what you are asking for.
> > >>
> > >> Adam
> > >>
> > >> On Thu, Jan 13, 2011 at 12:05 AM, Chamnap Chhorn <
> > chamnapchhorn@gmail.com
> > >>> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> I'm just stuck with exact keyword for several days. Hope you guys
> could
> > >>> help
> > >>> me. Here is the scenario:
> > >>>
> > >>>  1. It need to be matched with multi-word keyword and case
> insensitive
> > >>>  2. Partial word or single word matching with this field is not
> allowed
> > >>>
> > >>> I want to know the field type definition for this field and sample
> solr
> > >>> query. I need to combine this search with my full text search which
> > uses
> > >>> dismax query.
> > >>>
> > >>> Thanks
> > >>> --
> > >>> Chhorn Chamnap
> > >>> http://chamnapchhorn.blogspot.com/
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > Chhorn Chamnap
> > > http://chamnapchhorn.blogspot.com/
> >
>



-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message