lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Query ReRanking question
Date Fri, 16 Jan 2015 17:53:33 GMT
Ravi:

Yep, this is the standard way to have recency influence the rank rather than
take over absolute ordering via a sort=date_time or similar.

Of course how strongly the rank is influenced is "more an art than a science"
as far as figuring out what actual constants to put in....

Best,
Erick

On Fri, Jan 16, 2015 at 8:03 AM, Ravi Solr <ravisolr@gmail.com> wrote:
> As per Erick's suggestion reposting my response to the group. Joel and
> Erick Thank you very much for helping me out with the ReRanking question a
> while ago.
>
> I have an alternative which seems to be working better for me than
> ReRanking, can you kindly let me know of any pitfalls that you guys can
> think of about the this approach ?? Since we value relevancy & recency at
> the same time even though both are mutually exclusive, i thought maybe I
> can use the function queries to adjust the boost as follows
>
> boost=max(recip(ms(NOW/HOUR,publish_date),7.889e-10,1,1),scale(query($q),0,1))
>
> What I intended to do here is - if it matched a more recent doc it will
> take recency into consideration, however if the relevancy is better than
> date boost we keep relevancy. What do you guys think ??
>
> Thanks,
>
> Ravi Kiran Bhaskar
>
>
> On Mon, Sep 8, 2014 at 12:35 PM, Ravi Solr <ravisolr@gmail.com> wrote:
>
>> Joel and Erick,
>>            Thank you very much for explaining how the ReRanking works. Now
>> its a bit more clear.
>>
>> Thanks,
>>
>> Ravi Kiran Bhaskar
>>
>> On Sun, Sep 7, 2014 at 4:45 PM, Joel Bernstein <joelsolr@gmail.com> wrote:
>>
>>> Oops wrong usage pattern. It should be:
>>>
>>> 1) Main query is sorted by a field (scores tracked silently in the
>>> background).
>>> 2) Reranker is reRanking docs based on the score from the main query.
>>>
>>>
>>>
>>> Joel Bernstein
>>> Search Engineer at Heliosearch
>>>
>>>
>>> On Sun, Sep 7, 2014 at 4:43 PM, Joel Bernstein <joelsolr@gmail.com>
>>> wrote:
>>>
>>> > Ok, just reviewed the code. The ReRankingQParserPlugin always tracks the
>>> > scores from the main query. So this explains things. Speaking of
>>> explaining
>>> > things, the ReRankingParserPlugin also works with Lucene's explain. So
>>> if
>>> > you use debugQuery=true we should see that the score from the initial
>>> query
>>> > was combined with the score from the reRankQuery, which should be 1.
>>> >
>>> > You have stumbled on a interesting usage pattern which I never
>>> considered.
>>> > But basically what's happening is:
>>> >
>>> > 1) Main query is sorted by score.
>>> > 2) Reranker is reRanking docs based on the score from the main query.
>>> >
>>> > No, worries Erick, you've taught me a lot over the past couple of years!
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Joel Bernstein
>>> > Search Engineer at Heliosearch
>>> >
>>> >
>>> > On Sun, Sep 7, 2014 at 11:37 AM, Erick Erickson <
>>> erickerickson@gmail.com>
>>> > wrote:
>>> >
>>> >> Joel:
>>> >>
>>> >> I find that whenever I say something totally wrong publicly, I
>>> >> remember the correction really really well...
>>> >>
>>> >> Thanks for straightening that out!
>>> >> Erick
>>> >>
>>> >> On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein <joelsolr@gmail.com>
>>> >> wrote:
>>> >> > This folllowing query:
>>> >> >
>>> >> > http://localhost:8080/solr/select?q=malaysian airline
>>> crash&rq={!rerank
>>> >> > reRankQuery=$rqq reRankDocs=1000}&rqq=*:*&sort=publish_date
>>> >> > desc&fl=headline,publish_date,score
>>> >> >
>>> >> > Is doing the following:
>>> >> >
>>> >> > The main query is sorted by publish_date. Then the results are
>>> reranked
>>> >> by
>>> >> > *:*, which in theory would have no effect at all.
>>> >> >
>>> >> > The reRankQuery only uses the reRankQuery to re-rank the results.
The
>>> >> sort
>>> >> > param will always apply to the main query.
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > Joel Bernstein
>>> >> > Search Engineer at Heliosearch
>>> >> >
>>> >> >
>>> >> > On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr <ravisolr@gmail.com>
>>> wrote:
>>> >> >
>>> >> >> Erick,
>>> >> >>         Your idea about reversing Joel's suggestion seems to
give
>>> the
>>> >> best
>>> >> >> results of all the options I tried...but I cant seem to understand
>>> >> why. I
>>> >> >> thought the query shown below should give irrelevant results
as
>>> >> sorting by
>>> >> >> date would throw relevancy off...but somehow its getting relevant
>>> >> results
>>> >> >> with fair enough reverse chronology. It is as if the sort is
applied
>>> >> after
>>> >> >> the docs are collected and reranked (which is what I wanted).
One
>>> more
>>> >> >> thing that baffled me was, if I change reRankDocs from 1000
to100
>>> the
>>> >> >> results become irrelevant, which doesnt make sense.
>>> >> >>
>>> >> >> So can you kindly explain whats going on in the following query.
>>> >> >>
>>> >> >> http://localhost:8080/solr/select?q=malaysian airline
>>> >> crash&rq={!rerank
>>> >> >> reRankQuery=$rqq reRankDocs=1000}&rqq=*:*&sort=publish_date
>>> >> >> desc&fl=headline,publish_date,score
>>> >> >>
>>> >> >> I love the solr community, so much to learn from so many
>>> knowledgeable
>>> >> >> people.
>>> >> >>
>>> >> >> Thanks
>>> >> >>
>>> >> >> Ravi Kiran Bhaskar
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson <
>>> >> erickerickson@gmail.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >> > OK, why can't you switch the clauses from Joel's suggestion?
>>> >> >> >
>>> >> >> > Something like:
>>> >> >> > q=Malaysia plane crash&rq={!rerank reRankDocs=1000
>>> >> >> > reRankQuery=$myquery}&myquery=*:*&sort=date+desc
>>> >> >> >
>>> >> >> > (haven't tried this yet, but you get the idea....).
>>> >> >> >
>>> >> >> > Best,
>>> >> >> > Erick
>>> >> >> >
>>> >> >> > On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma
>>> >> >> > <markus.jelsma@openindex.io> wrote:
>>> >> >> > > Hi - You can already achieve this by boosting on
the document's
>>> >> >> recency.
>>> >> >> > The result set won't be exactly ordered by date but you
will get
>>> the
>>> >> most
>>> >> >> > relevant and recent documents on top.
>>> >> >> > >
>>> >> >> > > Markus
>>> >> >> > >
>>> >> >> > > -----Original message-----
>>> >> >> > >> From:Ravi Solr <ravisolr@gmail.com <mailto:ravisolr@gmail.com>
>>> >
>>> >> >> > >> Sent: Friday 5th September 2014 18:06
>>> >> >> > >> To: solr-user@lucene.apache.org <mailto:
>>> >> solr-user@lucene.apache.org>
>>> >> >> > >> Subject: Re: Query ReRanking question
>>> >> >> > >>
>>> >> >> > >> Thank you very much for responding. I want to
do exactly the
>>> >> opposite
>>> >> >> of
>>> >> >> > >> what you said. I want to sort the relevant docs
in reverse
>>> >> chronology.
>>> >> >> > If
>>> >> >> > >> you sort by date before hand then the relevancy
is lost. So I
>>> >> want to
>>> >> >> > get
>>> >> >> > >> Top N relevant results and then rerank those
Top N to achieve
>>> >> relevant
>>> >> >> > >> reverse chronological results.
>>> >> >> > >>
>>> >> >> > >> If you ask Why would I want to do that ??
>>> >> >> > >>
>>> >> >> > >> Lets take a example about Malaysian airline crash.
several
>>> >> articles
>>> >> >> > might
>>> >> >> > >> have been published over a period of time. When
I search for -
>>> >> >> malaysia
>>> >> >> > >> airline crash blackbox - I would want to see
"relevant" results
>>> >> but
>>> >> >> > would
>>> >> >> > >> also like to see the the recent developments
on the top i.e.
>>> >> >> > effectively a
>>> >> >> > >> reverse chronological order within the relevant
results, like
>>> >> telling
>>> >> >> a
>>> >> >> > >> story over a period of time
>>> >> >> > >>
>>> >> >> > >> Hope i am clear. Thanks for your help.
>>> >> >> > >>
>>> >> >> > >> Thanks
>>> >> >> > >>
>>> >> >> > >> Ravi Kiran Bhaskar
>>> >> >> > >>
>>> >> >> > >>
>>> >> >> > >> On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein
<
>>> >> joelsolr@gmail.com
>>> >> >> > <mailto:joelsolr@gmail.com> > wrote:
>>> >> >> > >>
>>> >> >> > >> > If you want the main query to be sorted
by date then the top
>>> N
>>> >> docs
>>> >> >> > >> > reranked by a query, that should work. Try
something like
>>> this:
>>> >> >> > >> >
>>> >> >> > >> > q=foo&sort=date+desc&rq={!rerank
reRandDocs=1000
>>> >> >> > >> > reRankQuery=$myquery}&myquery=blah
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> > Joel Bernstein
>>> >> >> > >> > Search Engineer at Heliosearch
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> > On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr
<
>>> ravisolr@gmail.com
>>> >> >> > <mailto:ravisolr@gmail.com> > wrote:
>>> >> >> > >> >
>>> >> >> > >> > > Can the ReRanking API be used to sort
within docs retrieved
>>> >> by a
>>> >> >> > date
>>> >> >> > >> > field
>>> >> >> > >> > > ? Can somebody help me understand how
to write such a
>>> query ?
>>> >> >> > >> > >
>>> >> >> > >> > > Thanks
>>> >> >> > >> > >
>>> >> >> > >> > > Ravi Kiran Bhaskar
>>> >> >> > >> > >
>>> >> >> > >> >
>>> >> >> > >>
>>> >> >> > >
>>> >> >> >
>>> >> >>
>>> >>
>>> >
>>> >
>>>
>>
>>

Mime
View raw message