lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <>
Subject Re: Using Lucene as a Document Comparison Tool
Date Fri, 13 Dec 2019 17:40:03 GMT
Have you tried making a BooleanQuery with a term for every word in the
query document as Optional? You will get a lot of matches,  ranked
according to the similarity.

On Thu, Dec 12, 2019 at 10:47 AM John Brown <> wrote:
> Hi,
> I have some questions about how to use Lucene for the specific purpose of
> finding document similarities. Lucene seems to have classes that were made
> for this, including: ClassicSimilarity and BM25Similarity. However I’m
> fumbling a bit when it comes to implementing them.
> From what I understand, to use these classes you simply set the similarity
> of your IndexWriter and IndexSearcher, then submit a query. The documents
> returned from your query should be ordered from highest to lowest
> similarity.
> My initial thought was to just use a phrase query to hold the "document" I
> want to find similarities to, but phrase queries are limited in that they
> will only return results that are deemed to fall within a certain slop
> value. Term/Boolean queries are similarly limited in that they allow
> documents to be sorted only if they contain all the terms in the query.
> Ideally, I’d like to submit a query that would essentially be a document
> itself. Each of my queries contain 10 or so phrases, that each contain 5-10
> terms. I would like to compare this query with all the documents in my
> index to see which is the most similar, and which is the least similar. I
> feel as if there is an easy way to do this that I'm missing, after all, I
> essentially just want to remove a step from the process. Any help would be
> much appreciated.
> Thank  you,
> -John B

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message