lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Bowyer <gbow...@fastmail.co.uk>
Subject Re: docid is just a signed int32
Date Thu, 18 Aug 2016 15:43:11 GMT
What are you trying to index that has more than 3 billion documents per
shard / index and can not be split as Adrien suggests?



On Thu, Aug 18, 2016, at 07:35 AM, Cristian Lorenzetto wrote:
> Maybe lucene has maxsize 2^31 because result set are java array where
> length is a int type.
> A suggestion for possible changes in future is to not use java array but
> Iterator. Iterator is a ADT more scalable , not sucking memory for
> returning documents.
> 
> 
> 2016-08-18 16:03 GMT+02:00 Glen Newton <glen.newton@gmail.com>:
> 
> > Or maybe it is time Lucene re-examined this limit.
> >
> > There are use cases out there where >2^31 does make sense in a single index
> > (huge number of tiny docs).
> >
> > Also, I think the underlying hardware and the JDK have advanced to make
> > this more defendable.
> >
> > Constructively,
> > Glen
> >
> >
> > On Thu, Aug 18, 2016 at 9:55 AM, Adrien Grand <jpountz@gmail.com> wrote:
> >
> > > No, IndexWriter enforces that the number of documents cannot go over
> > > IndexWriter.MAX_DOCS (which is a bit less than 2^31) and
> > > BaseCompositeReader computes the number of documents in a long variable
> > and
> > > ensures it is less than 2^31, so you cannot have indexes that contain
> > more
> > > than 2^31 documents.
> > >
> > > Larger collections should be written to multiple shards and use
> > > TopDocs.merge to merge results.
> > >
> > > Le jeu. 18 août 2016 à 15:38, Cristian Lorenzetto <
> > > cristian.lorenzetto@gmail.com> a écrit :
> > >
> > > > docid is a signed int32 so it is not so big, but really docid seams
> > not a
> > > > primary key unmodifiable but a temporary id for the view related to a
> > > > specific search.
> > > >
> > > > So repository can contains more than 2^31 documents.
> > > >
> > > > My deduction is correct ? is there a maximum size for lucene index?
> > > >
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message