lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristian Lorenzetto <>
Subject Re: docid is just a signed int32
Date Fri, 19 Aug 2016 13:56:09 GMT
ah :)

"with 3TB of ram (we have these running), int64 for >2^32 documents in a
single index should not be a problem"

Maybe i m reasoning in bad way but normally the size of storage is not
the size of memory.
I dont know lucene in the deep, but i would aspect lucene index is
scanning a block step by step, not all in memory. For this reason in a
previous post, i mentioned about possibility to use iterator instead
array, because array load in memory all the results,instead iterator
load a single document (or a fixed number of them) for every step. In
the case you call loadAll() there is a problem with memory.

2016-08-19 15:39 GMT+02:00, Glen Newton <>:
> Making docid an int64 is a non-trivial undertaking, and this work needs to
> be compared against the use cases and how compelling they are.
> That said, in the lifetime of most software projects a decision is made to
> break backward compatibility to move the project forward.
> When/if moving to int64 happens, it will be one of these moments. It is not
> a Bad Thing (necessarily).  :-)
> And for use cases, if I am running a commercial JVM on a 64 core machine
> with 3TB of ram (we have these running), int64 for >2^32 documents in a
> single index should not be a problem...  :-)
> glen
> On Fri, Aug 19, 2016 at 4:43 AM, Adrien Grand <> wrote:
>> Le ven. 19 août 2016 à 03:32, Trejkaz <> a écrit :
>> > But hang on:
>> > * TopDocs#merge still returns a TopDocs.
>> > * TopDocs still uses an array of ScoreDoc.
>> > * ScoreDoc still uses an int doc ID.
>> >
>> This is why ScoreDoc has a `shardId` so that you can know which index a
>> document comes from.
>> I'm not saying we should not switch to long doc ids, but as outlined in
>> some other responses it would be a challenging change.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message