lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Trejkaz (JIRA)" <>
Subject [jira] Commented: (LUCENE-2348) DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers
Date Wed, 02 Jun 2010 22:53:57 GMT


Trejkaz commented on LUCENE-2348:

That change broke nearly all our own filters.  We have a lot of filters which get their data
from a database where the IDs are across the top-level reader's doc IDs.  The DuplicateFilter
in contrib was noticed because I was reading about how the Filter API had changed, but when
I went to find an example of a filter which (in theory :)) would have worked the same way
so that I could borrow its solution, I found it was also making the same assumptions we were.

Our workaround was the same as described, passing the top-level reader into the constructor
and then computing the doc ID set for that, and splitting it up and doing the maths to create
the sub-sets for each segment reader.

The downside is that now you can only use this Filter instance with this reader, whereas the
original DuplicateFilter would have worked on multiple top-level readers happily.

Having the top reader passed in before each sub-reader sounds like a good idea.  It might
make it possible for the same filter instance to support multiple top-level readers as well.

> DuplicateFilter incorrectly handles multiple calls to getDocIdSet for segment readers
> -------------------------------------------------------------------------------------
>                 Key: LUCENE-2348
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>    Affects Versions: 2.9.2
>            Reporter: Trejkaz
> DuplicateFilter currently works by building a single doc ID set, without taking into
account that getDocIdSet() will be called once per segment and only with each segment's local

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message