lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Flexible search field analyser/tokenizer configuration
Date Wed, 01 Oct 2014 19:23:45 GMT
There's some confusion here.

First of all, you shouldn't be getting docs like "The Wall" at all,
_assuming_ your fq clause is meant to only include docs with
"the Royal Garden" in the results list. What's happening here is
that the text is being searched for in the default search field, which
will be the "df" setting in your solrconfig.xml file for the /select
request handler.

If that's not germane, then I suspect two things:
1> you don't have your stopwords set up properly in the
<fieldType> definition for the field in question
2> your default operator is OR, in which case try
fq:the AND Royal AND Garden
or set the default operator to AND. (q.op in the /select
request handler in this case).

Second, the fact that these are being returned in the doc is
totally irrelevant to the search process. The text returned
is a verbatim copy of the text sent in. The _indexed_ terms
that are actually searched against may or may not match
these exactly, i.e. the indexed terms may have stopwords
removed, cases folded, stemming performed, etc.

Finally, by using the fq clause combined with the *:* query,
you are completely bypassing ranking. The *:* query is a
"match all docs query", which doesn't bother with scoring.
fq clauses don't contribute to score by definition.

Best,
Erick

On Wed, Oct 1, 2014 at 11:52 AM, PeterKerk <petervdkerk@hotmail.com> wrote:
> Ok, I missed the Query tab where I can do the actual site search :)
>
> I've also used your links, but even with those I fail to grasp why the
> following is happening:
>
> This is my query:
> http://localhost:8983/solr/bm/select?q=*%3A*&fq=The+Royal+Garden&rows=50&fl=id%2Ctitle&wt=xml&indent=true
>
>
> And below the result.
> Notice how results that have "the" in their title are also returned...words
> like "the", "a", "in" in general are words I wish to ignore IF the rest of
> the title does not match.
> And now with my query "The Royal Garden", I have a result that is an exact
> match on all 3 words, but that result is listed all the way at the bottom.
> How can I prevent:
>
> a) make sure that items that only share the words I want to ignore "the",
> "a" etc. are not being returned
> b) make sure that the exact match is at the top of the results and only
> after that the partial matches, so that the 1st results would be "The Royal
> Garden" and the 2nd result would be "Royal"
>
> Thanks!
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>
> <lst name="responseHeader">
>   <int name="status">0</int>
>   <int name="QTime">1</int>
>   <lst name="params">
>     <str name="fl">id,title</str>
>     <str name="indent">true</str>
>     <str name="q">*:*</str>
>     <str name="_">1412188632532</str>
>     <str name="wt">xml</str>
>     <str name="fq">The Royal Garden</str>
>     <str name="rows">60</str>
>   </lst>
> </lst>
> <result name="response" numFound="9" start="0">
>   <doc>
>     <str name="id">1579</str>
>     <str name="title">Royal</str></doc>
>   <doc>
>     <str name="id">1603</str>
>     <str name="title">The Blue Lagoon</str></doc>
>   <doc>
>     <str name="id">1629</str>
>     <str name="title">The Nightingale DJ Light Sound Vision</str></doc>
>   <doc>
>     <str name="id">1648</str>
>     <str name="title">The Swingmasters</str></doc>
>   <doc>
>     <str name="id">2431</str>
>     <str name="title">The Cover Band</str></doc>
>   <doc>
>     <str name="id">2457</str>
>     <str name="title">The Teahouse Company</str></doc>
>   <doc>
>     <str name="id">2493</str>
>     <str name="title">The Task - Ultimate Party Band</str></doc>
>   <doc>
>     <str name="id">2499</str>
>     <str name="title">The Royal Garden</str></doc>
>   <doc>
>     <str name="id">2500</str>
>     <str name="title">The Wall</str></doc>
> </result>
> </response>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Flexible-search-field-analyser-tokenizer-configuration-tp4161624p4162174.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message