lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chamnap Chhorn <chamnapchh...@gmail.com>
Subject Re: Multi-word exact keyword case-insensitive search suggestions
Date Fri, 14 Jan 2011 02:08:19 GMT
Thanks for your reply. However, it doesn't work for my case at all. I think
it's the problem with query parser or something else. It forces me to put
double quote to the search query in order to get the results found.

<str name="rawquerystring">"sim 010"</str>
<str name="querystring">"sim 010"</str>
<str name="parsedquery">+DisjunctionMaxQuery((keyphrase:sim 010)) ()</str>
<str name="parsedquery_toString">+(keyphrase:sim 010) ()</str>

<str name="rawquerystring">smart mobile</str>
<str name="querystring">smart mobile</str>
<str name="parsedquery">
+((DisjunctionMaxQuery((keyphrase:smart))
DisjunctionMaxQuery((keyphrase:mobile)))~2) ()
</str>
<str name="parsedquery_toString">+(((keyphrase:smart) (keyphrase:mobile))~2)
()</str>

The intent here is to do a full text search, part of that is to search
keyword field, so I can't put quote to it.

On Thu, Jan 13, 2011 at 10:30 PM, Adam Estrada <
estrada.adam.groups@gmail.com> wrote:

> Hi,
>
> the following seems to work pretty well.
>
>    <fieldType name="text_ws" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer>
>        <tokenizer class="solr.KeywordTokenizerFactory" />
>        <filter class="solr.ShingleFilterFactory"
>          maxShingleSize="4" outputUnigrams="true"
> outputUnigramIfNoNgram="false" />
>      </analyzer>
>    </fieldType>
>
>    <!-- A text field that uses WordDelimiterFilter to enable splitting and
> matching of
>        words on case-change, alpha numeric boundaries, and non-alphanumeric
> chars,
>        so that a query of "wifi" or "wi fi" could match a document
> containing "Wi-Fi".
>        Synonyms and stopwords are customized by external files, and
> stemming is enabled.
>        The attribute autoGeneratePhraseQueries="true" (the default) causes
> words that get split to
>        form phrase queries. For example, WordDelimiterFilter splitting
> text:pdp-11 will cause the parser
>        to generate text:"pdp 11" rather than (text:PDP OR text:11).
>        NOTE: autoGeneratePhraseQueries="true" tends to not work well for
> non whitespace delimited languages.
>        -->
>    <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> autoGeneratePhraseQueries="true">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <!-- in this example, we will only use synonyms at query time
>        <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>        -->
>        <!-- Case insensitive stop word removal.
>          add enablePositionIncrements=true in both the index and query
>          analyzers to leave a 'gap' for more accurate phrase queries.
>        -->
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
>    <copyField source="cat" dest="text"/>
>    <copyField source="subject" dest="text"/>
>    <copyField source="summary" dest="text"/>
>    <copyField source="cause" dest="text"/>
>    <copyField source="status" dest="text"/>
>    <copyField source="urgency" dest="text"/>
>
> I ingest the source fields as text_ws (I know I've changed it a bit) and
> then copy the field to text. This seems to do what you are asking for.
>
> Adam
>
> On Thu, Jan 13, 2011 at 12:05 AM, Chamnap Chhorn <chamnapchhorn@gmail.com
> >wrote:
>
> > Hi all,
> >
> > I'm just stuck with exact keyword for several days. Hope you guys could
> > help
> > me. Here is the scenario:
> >
> >   1. It need to be matched with multi-word keyword and case insensitive
> >   2. Partial word or single word matching with this field is not allowed
> >
> > I want to know the field type definition for this field and sample solr
> > query. I need to combine this search with my full text search which uses
> > dismax query.
> >
> > Thanks
> > --
> > Chhorn Chamnap
> > http://chamnapchhorn.blogspot.com/
> >
>



-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message