lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Causse <>
Subject TimeLimitingCollector accuracy
Date Wed, 21 Dec 2016 12:27:51 GMT

This subject has been discussed in the past but I don't think that any 
real solution was implemented yet.

Here is a small test case to illustrate the problem:

This test will print:

Time waited on a slow query that matches all docs: 1109
Time waited on a slow query that matches no docs: 137258

The problem is that the time check is "passive", meaning that on large 
segments if the query is slow and matches no documents the timeout is 
very inaccurate making it nearly impossible to adjust client timeout vs 
collector timeout.

It happens to me where I have a query that implements a TwoPhaseIterator 
with an approximation that can be really bad not to say completely wrong 
(regex search on stored content with an approximation based on extracted 

Another problem I discovered is that if the query is accepted by the 
QueryCache it will eagerly set its bitset bypassing the Collector.

Reading I 
see that one suggested solution was to move the timeout check at a lower 
level (in the scorers) but it raised some concerns about checking the 
timeout too frequently.

But given that some efforts have been done to separate sub scorers from 
"top-level" scorers (see would it make sense 
now to make BulkScorers aware of some time constraints?

On my side, as a workaround to prevent catastrophes I'll probably 
continue to implement a circuit breaker in my TwoPhaseIterator#matches 
to either stop doing costly operation by returning false or by throwing 
an exception.

Lastly, I think it could help me to workaround this problem if the 
constructor of TimeExceededException was public, are there any reasons 
for this constructor to be private? Would it break important workflows 
if a scorer starts to throw this exception? It'd allow me to still 
return partial results.

Thanks for your help

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message