lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject RE: Short circuit AND or subquerying in lucene for performance
Date Thu, 16 Feb 2012 23:13:32 GMT

: Is there a way to run a subquery in Lucene, i.e. running a query only on
: the result of a first query to avoid scanning the whole index ?
: Is is worth forwarding this request to the developers, do you think it
: is feasible to implement such a short circuit operator where the term is
: "late" evaluated only if the expression to the left evaluates to true to
: avoid scanning the index in its entirety ?

As Uwe said, The problem is with the wildcard suffix. ... still.

It doesn't matter wether the term scanning is early or late -- the full 
term enum (for that field) has to be scanned at least once no matter what.  
The only possible optimization gained by doing a "late" scan of those 
terms would be if the other mandatory clauses ("field1:foo" in your 
example) didn't match any documents at all, then there would certianly be 
a benefit in not doing the scan, but i'm not sure if it's worth the eextra 
effor for that special case...

: > : Basically for queries such as field1:foo AND field2:*bar, I think it

consider the situation where "field1:foo" only matches a single document 
X, and imagine you have the changed the code so that iterating the terms 
for "field2:*bar" you are describing, so no term enumeration is done on 
field2 doesn't happen unless absolutely neccessary.  As soon as field1:foo 
matches on document X, we still have to enumerate every term on field2 
looking to see if that term also matches against document X -- we can't 
even stop once we find a matching term, because the scoring model may want 
to score that clause higher if multiple terms with the suffice "bar" match 
against document X.


So i'm pretty sure that even though it might be possible to optimize the 
edge case of no matching documents, i'm not sure if that optimization 
would slow down hte more common case.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message