lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristian Lorenzetto <cristian.lorenze...@gmail.com>
Subject Re: docid is just a signed int32
Date Thu, 18 Aug 2016 15:57:36 GMT
normally databases supports at least long primary key.
try to ask to twitter application , for example increasing every year more
than 4 petabytes :) Maybe they use big storage devices bigger than a pc
storage:)
However If you offer a possibility to use shards ... it is a possibility
anyway :)
For this reason, my suggestion was different ... was not related to size of
repository , but size of research result :):):)

" A suggestion for possible changes in future is to not use java array but
> Iterator. Iterator is a ADT more scalable , not sucking memory for
> returning documents."

it is just a suggestion anyway for my loved lucene :):)


2016-08-18 17:43 GMT+02:00 Greg Bowyer <gbowyer@fastmail.co.uk>:

> What are you trying to index that has more than 3 billion documents per
> shard / index and can not be split as Adrien suggests?
>
>
>
> On Thu, Aug 18, 2016, at 07:35 AM, Cristian Lorenzetto wrote:
> > Maybe lucene has maxsize 2^31 because result set are java array where
> > length is a int type.
> > A suggestion for possible changes in future is to not use java array but
> > Iterator. Iterator is a ADT more scalable , not sucking memory for
> > returning documents.
> >
> >
> > 2016-08-18 16:03 GMT+02:00 Glen Newton <glen.newton@gmail.com>:
> >
> > > Or maybe it is time Lucene re-examined this limit.
> > >
> > > There are use cases out there where >2^31 does make sense in a single
> index
> > > (huge number of tiny docs).
> > >
> > > Also, I think the underlying hardware and the JDK have advanced to make
> > > this more defendable.
> > >
> > > Constructively,
> > > Glen
> > >
> > >
> > > On Thu, Aug 18, 2016 at 9:55 AM, Adrien Grand <jpountz@gmail.com>
> wrote:
> > >
> > > > No, IndexWriter enforces that the number of documents cannot go over
> > > > IndexWriter.MAX_DOCS (which is a bit less than 2^31) and
> > > > BaseCompositeReader computes the number of documents in a long
> variable
> > > and
> > > > ensures it is less than 2^31, so you cannot have indexes that contain
> > > more
> > > > than 2^31 documents.
> > > >
> > > > Larger collections should be written to multiple shards and use
> > > > TopDocs.merge to merge results.
> > > >
> > > > Le jeu. 18 août 2016 à 15:38, Cristian Lorenzetto <
> > > > cristian.lorenzetto@gmail.com> a écrit :
> > > >
> > > > > docid is a signed int32 so it is not so big, but really docid seams
> > > not a
> > > > > primary key unmodifiable but a temporary id for the view related
> to a
> > > > > specific search.
> > > > >
> > > > > So repository can contains more than 2^31 documents.
> > > > >
> > > > > My deduction is correct ? is there a maximum size for lucene index?
> > > > >
> > > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message