lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Question about field boost
Date Tue, 23 Jul 2013 23:39:35 GMT
Bah! I didn't notice that you'd used edismax, ignore
my comments.

Sorry for the confusion
Erick

On Tue, Jul 23, 2013 at 2:34 PM, Joe Zhang <smartagent@gmail.com> wrote:
> I'm not sure I understand, Erick. I don't have a "text" field in my schema;
> "title" and "content" are both legal fields.
>
>
> On Tue, Jul 23, 2013 at 5:15 AM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> this isn't doing what you think.
>> title^10 content
>> is actually parsed as
>>
>> text:title^100 text:content
>>
>> where "text" is my default search field.
>>
>> assuming title is a field. If you look a little
>> farther up the debug output you'll see that.
>>
>> You probably want
>> title:content^100 or some such?
>>
>> Erick
>>
>> On Tue, Jul 23, 2013 at 1:43 AM, Jack Krupansky <jack@basetechnology.com>
>> wrote:
>> > That means that for that document "china" occurs in the title vs.
>> "snowden"
>> > found in a document but not in the title.
>> >
>> >
>> > -- Jack Krupansky
>> >
>> > -----Original Message----- From: Joe Zhang
>> > Sent: Tuesday, July 23, 2013 12:52 AM
>> > To: solr-user@lucene.apache.org
>> > Subject: Re: Question about field boost
>> >
>> >
>> > Is my reading correct that the boost is only applied on "china" but not
>> > "snowden"? How can that be?
>> >
>> > My query is: q=china+snowden&qf=title^10 content
>> >
>> >
>> > On Mon, Jul 22, 2013 at 9:43 PM, Joe Zhang <smartagent@gmail.com> wrote:
>> >
>> >> Thanks for your hint, Jack. Here is the debug results, which I'm having
>> a
>> >> hard deciphering (the two terms are "china" and "snowden")...
>> >>
>> >> 0.26839527 = (MATCH) sum of:
>> >>   0.26839527 = (MATCH) sum of:
>> >>     0.26757246 = (MATCH) max of:
>> >>       7.9147343E-4 = (MATCH) weight(content:china in 249), product of:
>> >>         0.019873314 = queryWeight(content:china), product of:
>> >>           1.6649085 = idf(docFreq=46832, maxDocs=91058)
>> >>           0.01193658 = queryNorm
>> >>         0.039825942 = (MATCH) fieldWeight(content:china in 249), product
>> >> of:
>> >>           4.8989797 = tf(termFreq(content:china)=24)
>> >>           1.6649085 = idf(docFreq=46832, maxDocs=91058)
>> >>           0.0048828125 = fieldNorm(field=content, doc=249)
>> >>       0.26757246 = (MATCH) weight(title:china^10.0 in 249), product of:
>> >>         0.5836803 = queryWeight(title:china^10.0), product of:
>> >>           10.0 = boost
>> >>           4.8898454 = idf(docFreq=1861, maxDocs=91058)
>> >>           0.01193658 = queryNorm
>> >>         0.45842302 = (MATCH) fieldWeight(title:china in 249), product
>> of:
>> >>           1.0 = tf(termFreq(title:china)=1)
>> >>           4.8898454 = idf(docFreq=1861, maxDocs=91058)
>> >>           0.09375 = fieldNorm(field=title, doc=249)
>> >>     8.2282536E-4 = (MATCH) max of:
>> >>       8.2282536E-4 = (MATCH) weight(content:snowden in 249), product of:
>> >>         0.03407834 = queryWeight(content:snowden), product of:
>> >>           2.8549502 = idf(docFreq=14246, maxDocs=91058)
>> >>           0.01193658 = queryNorm
>> >>         0.024145111 = (MATCH) fieldWeight(content:snowden in 249),
>> product
>> >> of:
>> >>           1.7320508 = tf(termFreq(content:snowden)=3)
>> >>           2.8549502 = idf(docFreq=14246, maxDocs=91058)
>> >>           0.0048828125 = fieldNorm(field=content, doc=249)
>> >>
>> >>
>> >> On Mon, Jul 22, 2013 at 9:27 PM, Jack Krupansky
>> >> <jack@basetechnology.com>wrote:
>> >>
>> >>> Maybe you're not doing anything wrong - other than having an artificial
>> >>> expectation of what the true relevance of your data actually is. Many
>> >>> factors go into relevance scoring. You need to look at all aspects of
>> >>> your
>> >>> data.
>> >>>
>> >>> Maybe your terms don't occur in your titles the way you think they do.
>> >>>
>> >>> Maybe you need a boost of 500 or more...
>> >>>
>> >>> Lots of potential maybes.
>> >>>
>> >>> Relevancy tuning is an art and craft, hardly a science.
>> >>>
>> >>> Step one: Know your data, inside and out.
>> >>>
>> >>> Use the debugQuery=true parameter on your queries and see how much of
>> the
>> >>> score is dominated by your query terms in the non-title fields.
>> >>>
>> >>> -- Jack Krupansky
>> >>>
>> >>> -----Original Message----- From: Joe Zhang
>> >>> Sent: Monday, July 22, 2013 11:06 PM
>> >>> To: solr-user@lucene.apache.org
>> >>> Subject: Question about field boost
>> >>>
>> >>>
>> >>> Dear Solr experts:
>> >>>
>> >>> Here is my query:
>> >>>
>> >>> defType=dismax&q=term1+term2&**qf=title^100 content
>> >>>
>> >>> Apparently (at least I thought) my intention is to boost the title
>> field.
>> >>> While I'm getting some non-trivial results, I'm surprised that the
>> >>> documents with both term1 and term2 in title (I know such docs do exist
>> >>> in
>> >>> my repository) were not returned (or maybe ranked very low). The
>> >>> situation
>> >>> does not change even when I use much larger boost factors.
>> >>>
>> >>> What am I doing wrong?
>> >>>
>> >>
>> >>
>> >
>>

Mime
View raw message