lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diego Ceccarelli <>
Subject BM25F ranking function on SOLR4.0
Date Fri, 23 Nov 2012 17:39:48 GMT
Hi all,
I'm going to write BM25F ranking function on SOLR4.0 [1]
I started from the BM25Similarity class, and I modified it to manage
different fields,
the problem is that if I well understood, with the default boolean
search you have to use
copyfields for searching on more than one field, so the stats that I
get (e.g., avg field length)
are on the virtual copyfield and not on the 'real' fields matched.

I tried using dismax with the relevant fields, and it works, but if I
have a match for a term on
different fields, the dismax manages the thing  taking the maximum value,
and not the summing the subscores as I would need.

Moreover after computing the score over each field,  I would need to normalize
the total sum using the saturation factor k1 (see the formula in [1]), so if
I perform the scoring running queries on different terms at the end
I'll have to
get the scores and combine them.
do you think I should write another QueryParser to manage the problem?

Computers are useless. They can only give you answers.
(Pablo Picasso)
Diego Ceccarelli
High Performance Computing Laboratory
Information Science and Technologies Institute (ISTI)
Italian National Research Council (CNR)
Via Moruzzi, 1
56124 - Pisa - Italy

Phone: +39 050 315 3055
Fax: +39 050 315 2040

View raw message