lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diego Ceccarelli <diego.ceccare...@gmail.com>
Subject BM25F ranking function on SOLR4.0
Date Fri, 23 Nov 2012 17:39:48 GMT
Hi all,
I'm going to write BM25F ranking function on SOLR4.0 [1]
I started from the BM25Similarity class, and I modified it to manage
different fields,
the problem is that if I well understood, with the default boolean
search you have to use
copyfields for searching on more than one field, so the stats that I
get (e.g., avg field length)
are on the virtual copyfield and not on the 'real' fields matched.

I tried using dismax with the relevant fields, and it works, but if I
have a match for a term on
different fields, the dismax manages the thing  taking the maximum value,
and not the summing the subscores as I would need.

Moreover after computing the score over each field,  I would need to normalize
the total sum using the saturation factor k1 (see the formula in [1]), so if
I perform the scoring running queries on different terms at the end
I'll have to
get the scores and combine them.
do you think I should write another QueryParser to manage the problem?


[1] http://nlp.uned.es/~jperezi/Lucene-BM25/
-- 
Computers are useless. They can only give you answers.
(Pablo Picasso)
_______________
Diego Ceccarelli
High Performance Computing Laboratory
Information Science and Technologies Institute (ISTI)
Italian National Research Council (CNR)
Via Moruzzi, 1
56124 - Pisa - Italy

Phone: +39 050 315 3055
Fax: +39 050 315 2040
________________________________________

Mime
View raw message