lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Vladimirov (JIRA)" <>
Subject [jira] Updated: (LUCENE-2362) Add support for slow filters with batch processing
Date Sat, 03 Apr 2010 12:03:27 GMT


Sergey Vladimirov updated LUCENE-2362:

    Attachment:     (was:

> Add support for slow filters with batch processing
> --------------------------------------------------
>                 Key: LUCENE-2362
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 3.0.1
>            Reporter: Sergey Vladimirov
>         Attachments:
> Internal implementation of IndexSearch assumes that Filter and scorer has almost equal
perfomance. But in our environment we have Filter implementation that is very expensive (in
compare to scorer).
> if we have, let's say, 2k of termdocs selected by scorer (each ~250 docs) and 2k selected
by filter, then 250k docs will be fastly checked (and filtered out) by scorer, and 250k docs
will be slowly checked by our filter.
> Using straigthforward implementation makes search out of 60 seconds per query boundary,
because each next() or advance() requires N queries to database PER CHECKED DOC. Using read
ahead technique allows us to optimze it to 35 seconds per query. Still too slow.
> The solution to problem is firstly select all documents by scorer and filter them in
batch by our filter. Example of implementation (with BitSet) in attachement. Currently it
takes only ~300 millseconds per query.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message