lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Index time boosts, payloads, and long query strings
Date Mon, 23 Nov 2009 13:03:26 GMT
Yep <G>....

On Mon, Nov 23, 2009 at 4:13 AM, Girish Redekar
<girish.redekar@aplopio.com>wrote:

> Thanks Erick!
>
> After reading your answer, and re-reading the Solr wiki, I realized my
> folly. I used to think that index-time boosts when applied on a per-field
> basis are equivalent to query time boosts to that field.
>
> To ensure that my new understanding is correct , I'll state it in my words.
> Index time boosts will determine boost for a *document* if it is counted as
> a hit. Query time boosts give you control on boosting the occurrence of a
> query in a specific field.
>
> Please correct me if I'm wrong (again) :-)
>
> Girish Redekar
> http://girishredekar.net
>
>
> On Sun, Nov 22, 2009 at 8:25 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > I still think they are apples and oranges. If you boost *all* titles,
> > you're effectively boosting none of them. Index time boosting
> > expresses "this document's title is more important than other
> > document titles." What I think you're after is "titles are more
> > important than other parts of the document.
> >
> > For this latter, you're talking query-time boosting. Boosting only
> > really makes sense if there are multiple clauses, something
> > like title:important OR body:unimportant. If this is true, speed
> > is irrelevant, you need correct behavior.
> >
> > Not that I think you'd notice either way. Modern computers
> > can do a LOT of FLOPS/sec. Here's an experiment: time
> > some queries (but beware of timing the very first ones, see
> > the Wiki) with boosts and without boosts. I doubt you'll see
> > enough difference to matter (but please do report back if you
> > do, it'll further my education <G>).
> >
> > But, depending on your index structure, you may get this
> > anyway. Generally, matches on shorter fields weigh more
> > in the score calculations than on longer fields. If you have
> > fields like title and body and you are querying on title:term OR
> > body:term, documents with term in the title will tend toward
> > higher scores.
> >
> > But before putting too much effort into this, do you have any
> > evidence that the default behavior is unsatisfactory? Because
> > unless and until you do, I think this is a distraction <G>...
> >
> > Best
> > Erick
> >
> > On Sun, Nov 22, 2009 at 8:37 AM, Girish Redekar
> > <girish.redekar@aplopio.com>wrote:
> >
> > > Hi Erick -
> > >
> > > Maybe I mis-wrote.
> > >
> > > My question is: would "title:any_query^4.0" be faster/slower than
> > applying
> > > index time boost to the field title. Basically, if I take *every* user
> > > query
> > > and search for it in title with boost (say, 4.0) - is it different than
> > > saying field title has boost 4.0?
> > >
> > > Cheers,
> > > Girish Redekar
> > > http://girishredekar.net
> > >
> > >
> > > On Sun, Nov 22, 2009 at 2:02 AM, Erick Erickson <
> erickerickson@gmail.com
> > > >wrote:
> > >
> > > > I'll take a whack at index .vs. query boosting. They are expressing
> > very
> > > > different concepts. Let's claim we're interested in boosting the
> title
> > > > field....
> > > >
> > > > Index time boosting is expressing "this document's title is X more
> > > > important
> > > >
> > > > than a normal document title". It doesn't matter *what* the title is,
> > > > any query that matches on anything in this document's title will give
> > > this
> > > > document a boost. I might use this to give preferential treatment to
> > all
> > > > encyclopedia entries or something.
> > > >
> > > > Query time boosting, like "title:solr^4.0" expresses "Any document
> with
> > > > solr
> > > > in
> > > > it's title is more important than documents without solr in the
> title".
> > > > This
> > > > really
> > > > only makes sense if you have other clauses that might cause a
> document
> > > > *without*
> > > > solr  the title to match......
> > > >
> > > > Since they are doing different things, efficiency isn't really
> > relevant.
> > > >
> > > > HTH
> > > > Erick
> > > >
> > > >
> > > > On Sat, Nov 21, 2009 at 2:13 AM, Girish Redekar
> > > > <girish.redekar@aplopio.com>wrote:
> > > >
> > > > > Hi ,
> > > > >
> > > > > I'm relatively new to Solr/Lucene, and am using Solr (and not
> lucene
> > > > > directly) primarily because I can use it without writing java code
> > > (rest
> > > > of
> > > > > my project is python coded).
> > > > >
> > > > > My application has the following requirements:
> > > > > (a) ability to search over multiple fields, each with different
> > weight
> > > > > (b) If possible, I'd like to have the ability to add
> extra/diminished
> > > > > weights to particular tokens within a field
> > > > > (c) My query strings have large lengths (50-100 words)
> > > > > (d) My index is 500K+  documents
> > > > >
> > > > > 1) The way to (a) is field boosting (right?). My question is: Is
> all
> > > > field
> > > > > boosting done at query time? Even if I give index time boosts to
> > > fields?
> > > > Is
> > > > > there a performance advantage in boosting fields at index time vs
> at
> > > > using
> > > > > something like fieldname:querystring^boost.
> > > > > 2) From what I've read, it seems that I can do (b) using payloads.
> > > > However,
> > > > > as this link (
> > > > >
> > > > >
> > > >
> > >
> >
> http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
> > > > > )
> > > > > suggests, I will have to write a payload aware Query Parser. Wanted
> > to
> > > > > confirm if this is indeed the case - or is there a out-of-box way
> to
> > > > > implement payloads (am using Solr1.4)
> > > > > 3) For my project, the user fills multiple text boxes (for each
> > query).
> > > I
> > > > > combine these into a single query (with different treatment for
> > > contents
> > > > of
> > > > > each text box). Consequently, my query looks something like
> > > (fieldname1:
> > > > > queryterm1 queryterm2^2.0 queryterm3^3.0 +queryterm4)^1.0  Are
> there
> > > any
> > > > > guidelines for improving performance of such a system (sorry, this
> > bit
> > > is
> > > > > vague)
> > > > >
> > > > > Any help with this will be great !
> > > > >
> > > > > Girish Redekar
> > > > > http://girishredekar.net
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message