lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: How large is your solr index?
Date Sat, 03 Jan 2015 18:05:08 GMT
On 1/3/2015 9:02 AM, Erick Erickson wrote:
> bq: For Solr 5 why don't we switch it to 64 bit ??
> -1 on this for a couple of reasons
>> it'd be pretty invasive, and 5.0 may be imminent. Far too big a change to implement
at the last second
>> It's not clear that it's even useful. Once you get to that many documents, performance
usually suffers
> Of course I wouldn't be doing the work so I really don't have much of
> a vote, but it's not clear to me at
> all that enough people would actually have a use-case for 2b+ docs in
> a single shard to make it
> worthwhile. At that scale GC potentially becomes really unpleasant for
> instance....

I agree, 2 billion documents in a single index is MORE than enough.  If
you actually create an index that large, you're going to have
performance problems, and most of those performance problems will likely
be related to garbage collection.  I can extrapolate one such problem
from personal experience on a much smaller index.

A filterCache entry for a 2 billion document index is 256MB in size.
Assuming you're using the G1 collector, the maximum size for a G1 heap
region is 32MB, which means that at that size, every single filter will
result in an object that is allocated immediately from the old
generation (it's called a humongous allocation).  Allocating that much
memory from the old generation will eventually (and frequently) result
in a full garbage collection ... and you do not want your application to
wait for a full garbage collection on the heap size that would be
required for a 2 billion document index.  It could easily exceed 30 or
60 seconds.

When you consider the current limitations of G1GC, it would be advisable
to keep each Solr index below 100 million documents.  At 134,217,728
documents, each filter object will be too large (more than 16MB) to be
considered a normal allocation on the max heap region size (32MB).

Even with the older battle-tested CMS collector (assuming good tuning
options), I think the huge object sizes (and the huge number of smaller
objects) resulting from a 2 billion document index will have major
garbage collection problems.


View raw message