lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jian chen <>
Subject Re: trying to boost a phrase higher than its individual words
Date Fri, 28 Oct 2005 03:04:56 GMT

It seems what you want to achieve could be implemented using the Cover
Density algorithm. I am not sure if any existing query classes in the Lucene
distribution does this already. But in case not, this is what I am think

Make a custom query class, called CoverDensityQuery, which is modeled after

The CoverDensityQuery could accept two arguments as its constructor, the
Terms and the numOfTermsMatched.

For example, to search for "classical music", you will first construct
CoverDensityQuery like:
new CoverDensityQuery(new String[]{"classical", "music"}, 2);

This should return all documents that contain both "classical" and "music".
The ranking will be based on covers, each cover is a span with the two terms
at each end. The shorter the cover, the higher the rank, the more the
covers, the higher the rank.

If the returned documents are not enough, then, do another query like:
new CoverDensityQuery(new String[]{"classical", "music"}, 1);

This should return documents either containing "classical" or "music", but
not both.

The detailed algorithm will be constructed similar to PhraseQuery.

I will write such a query class in the future, just as a proof of concept
for cover density algorithm.



On 10/27/05, Andy Lee <> wrote:
> I have a situation where I want to search for individual words in a
> phrase as well as the phrase itself. For example, if the user enters
> ["classical music"] (with quotes) I want to find documents that
> contain "classical music" (the phrase) *and* the individual words
> "classical" and "music".
> Of course, I could just search for the individual words and the
> phrase would get found as a consequence. But I want documents
> containing the phrase to appear first in the search results, since
> the phrase is the user's primary interest.
> I've constructed the following query, using boost values...
> [+(content:"classical music"^5.0 content:classical^0.1
> content:music^0.1)]
> ...but the boost values don't seem to affect the order of the search
> results.
> Am I misunderstanding the purpose or proper usage of boosts, and if
> so, can someone explain (at least roughly) how to achieve the desired
> result?
> --Andy
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message