lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: DocIdSet to represent small numberr of hits in large Document set
Date Tue, 05 Apr 2011 10:19:59 GMT
Can we simply factor out (poach!) those useful-sounding classes from
Nutch into Lucene?


On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman <> wrote:
> I'm converting a Lucene 2.3.2 to 2.4.1 (with a view to going to 2.9.4).
> Many of our indexes are 5M+ Documents, however, only a small subset of these
> are relevant to any user.  As a DocIdSet, backed by a BitSet or OpenBitSet,
> is rather inefficient in terms of memory use, what is the recommended way to
> DocIdSet implementation to use in this scenario?
> Seems like SortedVIntList can be used to store the info, but it has no
> methods to build the list in the first place, requiring an array or bitset
> in the constructor.
> I had used Nutch's DocSet and HashDocSet implementations in my 2.3.2
> deployment, but want to move away from that Nutch dependency, so wondered if
> Lucene had a way to do this?
> Thanks
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message