lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: severe problems with soft and hard commits in a large index
Date Wed, 06 May 2015 13:30:45 GMT
On 5/6/2015 1:58 AM, adfel70 wrote:
> I have a cluster of 16 shards, 3 replicas. the cluster indexed nested
> documents.
> it currently has 3 billion documents overall (parent and children).
> each shard has around 200 million docs. size of each shard is 250GB.
> this runs on 12 machines. each machine has 4 SSD disks and 4 solr processes.
> each process has 28GB heap.  each machine has 196GB RAM.
> 
> I perform periodic indexing throughout the day. each indexing cycle adds
> around 1.5 million docs. I keep the indexing load light - 2 processes with
> bulks of 20 docs.
> 
> My use case demands that each indexing cycle will be visible only when the
> whole cycle finishes.
> 
> I tried various methods of using soft and hard commits:

I personally would configure autoCommit on a five minute (maxTime of
300000) interval with openSearcher=false.  The use case you have
outlined (not seeing changed until the indexing is done) demands that
you do NOT turn on autoSoftCommit, that you do one manual commit at the
end of indexing, which could be either a soft commit or a hard commit.
I would recommend a soft commit.

Because it is the openSearcher part of a commit that's very expensive,
you can successfully do autoCommit with openSearcher=false on an
interval like 10 or 15 seconds and not see much in the way of immediate
performance loss.  That commit is still not free, not only in terms of
resources, but in terms of java heap garbage generated.

The general advice with commits is to do them as infrequently as you
can, which applies to ANY commit, not just those that make changes visible.

> with all methods I encounter pretty much the same problem:
> 1. heavy GCs when soft commit is performed (methods 1,2) or when hardcommit
> opensearcher=true is performed. these GCs cause heavy latency (average
> latency is 3 secs. latency during the problem is 80secs)
> 2. if indexing cycles come too often, which causes softcommits or
> hardcommits(opensearcher=true) occur with a small interval one after another
> (around 5-10minutes), I start getting many OOM exceptions.

If you're getting OOM, then either you need to change things so Solr
requires less heap memory, or you need to increase the heap size.
Changing things might be either the config or how you use Solr.

Are you tuning your garbage collection?  With a 28GB heap, tuning is not
optional.  It's so important that the startup scripts in 5.0 and 5.1
include it, even though the default max heap is 512MB.

Let's do some quick math on your memory.  You have four instances of
Solr on each machine, each with a 28GB heap.  That's 112GB of memory
allocated to Java.  With 196GB total, you have approximately 84GB of RAM
left over for caching your index.

A 16-shard index with three replicas means 48 cores.  Divide that by 12
machines and that's 4 replicas on each server, presumably one in each
Solr instance.  You say that the size of each shard is 250GB, so you've
got about a terabyte of index on each server, but only 84GB of RAM for
caching.

Even with SSD, that's not going to be anywhere near enough cache memory
for good Solr performance.

All these memory issues, including GC tuning, are discussed on this wiki
page:

http://wiki.apache.org/solr/SolrPerformanceProblems

One additional note: By my calculations, each filterCache entry will be
at least 23MB in size.  This means that if you are using the filterCache
and the G1 collector, you will not be able to avoid humongous
allocations, which is any allocation larger than half the G1 region
size.  The max configurable G1 region size is 32MB.  You should use the
CMS collector for your GC tuning, not G1.  If you can reduce the number
of documents in each shard, G1 might work well.

Thanks,
Shawn


Mime
View raw message