lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Vector Model and Relevance Feedback
Date Wed, 02 Nov 2005 15:44:28 GMT
Others can correct me if I am wrong, but I don't think a "pure" Rochio 
feedback loop is possible in the current state, since Lucene doesn't 
currently support negative boosts 
(  Having 
said that, what we do, in a nutshell is similar to what you describe:
For the positive examples, store the terms and a boost factor.  The 
boost factor is the frequency of the term across all the positive 
examples multiplied by beta.
Then for the negative examples, decrement the boost factor by gamma 
times the frequency of the term in all the negative examples.  Remove 
any terms that have a boost of zero or less.

In the end, you construct a new query out of the terms and boosts that 
you can submit.  I think it is more of an approximation of Rochio, but 
have had good results from it.  You also probably want to limit the 
number of terms per document you add, at least if you are concerned 
about performance.

Stefan Gusenbauer wrote:

> I've some thoughts about Lucene and Relevance Feedback. I want to 
> implement some variation of the Roccio Formula and there is the problem.
> The formula is like this:
> Query(new) = alpha * Query(old) + beta * Sum(Relevant Documents) - 
> gamma * Sum(Non Relevant Documents)
> The relevant documents in this formula should be in a vector 
> representation. This is the problem If I work with TermFreqVectors 
> then the vectors are not equally long and contains different terms. My 
> solution now is to take the TermFreqVectors and minimize them to the 
> least common multiple and perform then the computation.
> So my questions are:
> Is this the only way to do so? ( I hope so not)
> Is there an add on for lucene to get a real vector representation?
> Does anyone has experiences with this issue?
> Thanks
> Stefan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
337 Hinds Hall 
Syracuse, NY 13244 
Voice:  315-443-5484 
Fax: 315-443-6886 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message