lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rich Cariens <richcari...@gmail.com>
Subject Re: Experience with indexing billions of documents?
Date Fri, 02 Apr 2010 17:31:16 GMT
A colleague of mine is using native Lucene + some home-grown
patches/optimizations to index over 13B small documents in a 32-shard
environment, which is around 406M docs per shard.

If there's a 2B doc id limitation in Lucene then I assume he's patched it
himself.

On Fri, Apr 2, 2010 at 1:17 PM, <darren@ontrenet.com> wrote:

> My guess is that you will need to take advantage of Solr 1.5's upcoming
> cloud/cluster renovations and use multiple indexes to comfortably achieve
> those numbers. Hypthetically, in that case, you won't be limited by single
> index docid limitations of Lucene.
>
> > We are currently indexing 5 million books in Solr, scaling up over the
> > next few years to 20 million.  However we are using the entire book as a
> > Solr document.  We are evaluating the possibility of indexing individual
> > pages as there are some use cases where users want the most relevant
> pages
> > regardless of what book they occur in.  However, we estimate that we are
> > talking about somewhere between 1 and 6 billion pages and have concerns
> > over whether Solr will scale to this level.
> >
> > Does anyone have experience using Solr with 1-6 billion Solr documents?
> >
> > The lucene file format document
> > (http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations)
> > mentions a limit of about 2 billion document ids.   I assume this is the
> > lucene internal document id and would therefore be a per index/per shard
> > limit.  Is this correct?
> >
> >
> > Tom Burton-West.
> >
> >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message