lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: Help to Understand a Solr Query
Date Fri, 16 May 2014 14:47:13 GMT
On 5/13/2014 8:56 AM, nativecoder wrote:
> <field name=&quot;&lt;b>Exact_Word" omitPositions="true" termVectors="false"
> omitTermFreqAndPositions="true" compressed="true" type="string_ci"
> multiValued="false" indexed="true" stored="true" required="false"
> omitNorms="true"/>
> 
> <field name="Word" compressed="true" type="email_text_ptn"
> multiValued="false" indexed="true" stored="true" required="false"
> omitNorms="true"/>
> 
> <fieldtype name="string_ci" class="solr.TextField" sortMissingLast="true"
> omitNorms="true"><analyzer><tokenizer
> class="solr.KeywordTokenizerFactory"/><filter
> class="solr.LowerCaseFilterFactory"/></analyzer></fieldtype>
> 
> <copyField source="Word" dest="Exact_Word"/>
> 
> As you can see Exact_Word has the KeywordTokenizerFactory and that should
> treat the string as it is.
> 
> Following is my responseHeader. As you can see I am searching my string only
> in the filed Exact_Word and expecting it to return the Word field and the
> score
> 
> "responseHeader":{
>     "status":0,
>     "QTime":14,
>     "params":{
>       "explainOther":"",
>       "fl":"Word,score",
>       "debugQuery":"on",
>       "indent":"on",
>       "start":"0",
>       "q":"d!sdasdsdwasd!asd@dsadsadas.edu",
>       "qf":"Exact_Word",
>       "wt":"json",
>       "fq":"",
>       "version":"2.2",
>       "rows":"10"}},
> 
> 
> But when I enter email with the following string
> "d!sdasdsdwasdasd@dsadsadas.edu" it splits the string to two. I was under
> the impression that KeywordTokenizerFactory will treat the string as it is.
> 
> Following is the query debug result. There you can see it has split the word
>  "parsedquery":"+((DisjunctionMaxQuery((Exact_Word:d))
> -DisjunctionMaxQuery((Exact_Word:sdasdsdwasdasd@dsadsadas.edu)))~1)",
> 
> can someone please tell why it produce the query result as this
> 
> If I put a string without the "!" sign as below, the produced query will be
> as below
> 
> "parsedquery":"+DisjunctionMaxQuery((Exact_Word:d_sdasdsdwasd_asd@dsadsadas.edu))",.
> This is what I expected solr to even with the "!" mark. with "_" mark it
> wont do a string split and treats the string as it is
> 
> I thought if the KeywordTokenizerFactory is applied then it should return
> the exact string as it is
> 
> Please help me to understand what is going wrong here 

I cannot make Solr (4.7.2) behave this way with exclamation points.  I
tried debugQuery=true, using the standard query parser with df set to
the field as well as setting the qf parameter on the dismax parser and
the edismax parser.  None of these will split the string like what shows
up in your debugQuery.

Here's a screenshot of the analysis screen for a similar fieldType with
your input data:

https://www.dropbox.com/s/0v2lbc76h9wejw1/lowercase-analysis.png

KT is the KeywordTokenizer.  ICUFF is the ICUFoldingFilter -- lowercase
on steroids.  TF is the TrimFilter.

Restating what Jack said in his reply:

http://people.apache.org/~hossman/#threadhijack

Thanks,
Shawn


Mime
View raw message