lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Markwalder" <>
Subject Query Optimization
Date Fri, 06 May 2005 18:29:39 GMT
Hi there,

I try to optimize my query, but I think I have come to a point where I have
to extend the functionality of lucene.

I have a set of pretty simple queries, let's call them <A>, <B>, <C>, ...
Now I try to create a query which matches, if AT LEAST 2 of this simple
queries match. I started with a query like the following one (build all the
possible pairs and combine them with 'OR'):

  (<A> AND <B>) OR (<A> AND <C>) OR (<B> and <C>) OR ...

My first question:
Is it safe to create the simple Query objects (most of the time TermQuery
objects) and then use these objects multiple times in a BooleanQuery object?


Query a = new TermQuery("name", "John");
Query b = new TermQuery("name", "James");
Query c = new TermQuery("name", "Jack");

Query q1 = new BooleanQuery();

Query q2 = new BooleanQuery();

Query q3 = new BooleanQuery();

Query q = new BooleanQuery();

Back to the optimization problem:
I was able to optimize the above like this:

  (<A> AND (<B> OR <C> OR ...)) OR (<B> AND (<C> OR ...)) OR
(<C> AND (...))
OR ...

But this is still to slow and there must be some potential to improve this:
If <A> matches, but the subquery (<B> OR <C> OR ...) doesn't match,
evaluation could stop. But the query above will continue. I could change the
query again to something like the following:

  (<A> AND (<B> OR <C> OR ...)) OR NOT(<A>) AND ( .... )

But this would get pretty complex.

Now I would like to create a subclass of Query (or MultiTermQuery?),
something like a MatchCountQuery where I can add my simple Query objects:

MatchCountQuery query = new MatchCountQuery();
query.setMinCount(2); // at least 2 matches
query.setMaxCount(-1); // no limit (I don't realy need this)

Has anyone ever done something like this? Or is there a simpler solution?

Thank you for any kind of help in this case.


View raw message