lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glen Newton <glen.new...@gmail.com>
Subject Re: docid is just a signed int32
Date Fri, 19 Aug 2016 15:57:39 GMT
I was referring to memory (RAM).

We have machines running right now with 1TB _RAM_ and will be getting
machines with 3TB RAM (Dell R830 with 48 64GM DIMMs) (Sorry, I was
incorrect when I said we were running the 3TB machines _now_).

Glen



On Fri, Aug 19, 2016 at 9:56 AM, Cristian Lorenzetto <
cristian.lorenzetto@gmail.com> wrote:

> ah :)
>
> "with 3TB of ram (we have these running), int64 for >2^32 documents in a
> single index should not be a problem"
>
> Maybe i m reasoning in bad way but normally the size of storage is not
> the size of memory.
> I dont know lucene in the deep, but i would aspect lucene index is
> scanning a block step by step, not all in memory. For this reason in a
> previous post, i mentioned about possibility to use iterator instead
> array, because array load in memory all the results,instead iterator
> load a single document (or a fixed number of them) for every step. In
> the case you call loadAll() there is a problem with memory.
>
>
>
>
> 2016-08-19 15:39 GMT+02:00, Glen Newton <glen.newton@gmail.com>:
> > Making docid an int64 is a non-trivial undertaking, and this work needs
> to
> > be compared against the use cases and how compelling they are.
> >
> > That said, in the lifetime of most software projects a decision is made
> to
> > break backward compatibility to move the project forward.
> > When/if moving to int64 happens, it will be one of these moments. It is
> not
> > a Bad Thing (necessarily).  :-)
> >
> > And for use cases, if I am running a commercial JVM on a 64 core machine
> > with 3TB of ram (we have these running), int64 for >2^32 documents in a
> > single index should not be a problem...  :-)
> >
> > glen
> >
> > On Fri, Aug 19, 2016 at 4:43 AM, Adrien Grand <jpountz@gmail.com> wrote:
> >
> >> Le ven. 19 août 2016 à 03:32, Trejkaz <trejkaz@trypticon.org> a écrit
:
> >>
> >> > But hang on:
> >> > * TopDocs#merge still returns a TopDocs.
> >> > * TopDocs still uses an array of ScoreDoc.
> >> > * ScoreDoc still uses an int doc ID.
> >> >
> >>
> >> This is why ScoreDoc has a `shardId` so that you can know which index a
> >> document comes from.
> >>
> >> I'm not saying we should not switch to long doc ids, but as outlined in
> >> some other responses it would be a challenging change.
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message