lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Similarity Class - A couple of questions
Date Sat, 15 Mar 2008 00:15:36 GMT

: The elements I want are the TF and the IDF. I want to use them to calculate
: my own score, since I don't want to have the hassle to change the whole core
: of the scoring machine. However, as I've understood, Lucene's notion of
: TF-IDF isn't mine's. So, I'm left with "translating" its values to my
: values. As for the IDF, I've managed to squeeze the elements I want from
: maxDocs and docFreq methods. And in two lines I have my IDF value. Regarding
: TF however, it's a little more complex.. I can't even get the value of the
: TF for each document, and I can't get near the values needed to calculate
: it.. from the explain method, I see that lucene HAS to calculate them,
: somehow, but I don't know where to look, how to look.

1) part of your confusion may be that the and Similarity.idf 
methods are not ment to give end users access to the raw IDF and TF values 
for arbitrary terms and documents -- they are an extension point for 
devlopers to inform Scorers how they want the IDF and TF components of the 
scoring formula to be computed.  if you implement your own Similarity 
class, your tf(float) method will be called with the raw numeric "term 
frequency" for each term during scoring; your idf(int,int) method will be 
called with the raw numeric "document frequency" and "number of documents" 
for each term during scoring.  if you want to skip Similarity alltogether 
and get the "term frequency" for an arbitrary term outside of scoring then 
you need to use a TermDocs instance positioned at the Term you are 
interested in (you can get one from IndexReader.termDocs(Term)), then 
iterate to the doc you are interested in and read the freq().

2) since your questions is specificly be about using the Lucene-Java APIs 
(and not a "general" question about the Lucene project/community) you are 
likely to more useful responses to questions like this if you email the 
java-user@lucene list, which is focused on discussions about using these 
APIs, and has a much larger subscriber base then general@lucene.  if you 
have followup questions, please post them there.


View raw message