lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: scoring individual values in a multivalued field
Date Sun, 07 Sep 2008 01:20:47 GMT
: I have a multivalued field that I would want to score individually for each
: value. Is there an easy way to do that?

Lucene-Java has a (somewhat new) feature called "Payloads" which allows 
for things like this built arround the idea that when indexing, any Token 
cn contain an arbitrary data payload which is persisted along with the 
TermPosition info in the index -- At query time, different types of 
queries can use/abuse that payload anyway they want.

Currently payload support in Solr is somewhat limited.  If you have a 
custom Analyzer or Tokenizer/TokenFilter that knows about Payloads, they 
will make it into the index, but you would need to write a custom 
Similiarty and QParserPlugin to take advantage of it (there's already a 
BoostingTermQuery in Lucene that you can leverage)

Payloads is a really powerful feature, but the fact that it can be used in 
*sooooo* many different ways is probably the biggest reasons why Solr 
doesn't have any features yet to make payloads easier to use just via 
configuration.

At the moment, the simplest mechanisms for achieving something like what 
you are describing that i know of are:
  1) repetitive values.  Add a value twice to make it counnt (roughly) 
     twice as much. (eliminating lengthNorm and customing your Similarity 
     is neccessary to make it worth exactly twice as much)
  2) differnet fields.  Partition the spectrum of "importance" for your 
     values into N buckets, make a field for each bucket, put the value in 
     the bucket that makes the most sense, and at query time query ofr 
     each bucket with a differnet query time boost.

: 2) the value of normField is persisted as a byte in the index and the
: precision loss hurts.

for a field like what you are describing, you'll probably want to 
omitNorms completley just to make sure docs with lots of values aren't 
penalized.



-Hoss


Mime
View raw message