lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: maximum index size
Date Thu, 09 Sep 2004 03:50:06 GMT
Chris Fraschetti wrote:
> I've seen throughout the list mentions of millions of documents.. 8
> million, 20 million, etc etc.. but can lucene potentially handle
> billions of documents and still efficiently search through them?

Lucene can currently handle up to 2^31 documents in a single index.  To 
a large degree this is limited by Java ints and arrays (which are 
accessed by ints).  There are also a few places where the file format 
limits things to 2^32.

On typical PC hardware, 2-3 word searches of an index with 10M 
documents, each with around 10k of text, require around 1 second, 
including index i/o time.  Performance is more-or-less linear, so that a 
100M document index might require nearly 10 seconds per search.  Thus, 
as indexes grow folks tend to distribute searches in parallel to many 
smaller indexes.  That's what Nutch and Google 
( do.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message