lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristian Lorenzetto <cristian.lorenze...@gmail.com>
Subject Re: docid is just a signed int32
Date Fri, 19 Aug 2016 13:56:09 GMT
ah :)

"with 3TB of ram (we have these running), int64 for >2^32 documents in a
single index should not be a problem"

Maybe i m reasoning in bad way but normally the size of storage is not
the size of memory.
I dont know lucene in the deep, but i would aspect lucene index is
scanning a block step by step, not all in memory. For this reason in a
previous post, i mentioned about possibility to use iterator instead
array, because array load in memory all the results,instead iterator
load a single document (or a fixed number of them) for every step. In
the case you call loadAll() there is a problem with memory.




2016-08-19 15:39 GMT+02:00, Glen Newton <glen.newton@gmail.com>:
> Making docid an int64 is a non-trivial undertaking, and this work needs to
> be compared against the use cases and how compelling they are.
>
> That said, in the lifetime of most software projects a decision is made to
> break backward compatibility to move the project forward.
> When/if moving to int64 happens, it will be one of these moments. It is not
> a Bad Thing (necessarily).  :-)
>
> And for use cases, if I am running a commercial JVM on a 64 core machine
> with 3TB of ram (we have these running), int64 for >2^32 documents in a
> single index should not be a problem...  :-)
>
> glen
>
> On Fri, Aug 19, 2016 at 4:43 AM, Adrien Grand <jpountz@gmail.com> wrote:
>
>> Le ven. 19 août 2016 à 03:32, Trejkaz <trejkaz@trypticon.org> a écrit :
>>
>> > But hang on:
>> > * TopDocs#merge still returns a TopDocs.
>> > * TopDocs still uses an array of ScoreDoc.
>> > * ScoreDoc still uses an int doc ID.
>> >
>>
>> This is why ScoreDoc has a `shardId` so that you can know which index a
>> document comes from.
>>
>> I'm not saying we should not switch to long doc ids, but as outlined in
>> some other responses it would be a challenging change.
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message