lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Wartes <jwar...@whitepages.com>
Subject Re: Memory leak in Solr
Date Mon, 05 Dec 2016 03:46:03 GMT

Here’s an earlier post where I mentioned some GC investigation tools:
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3C8F8FA32D-EC0E-4352-86F7-4B2D8A906903@whitepages.com%3E

In my experience, there are many aspects of the Solr/Lucene memory allocation model that scale
with things other than documents returned. (such as cardinality, or simply index size) A single
query on a large index might consume dozens of megabytes of heap to complete. But that heap
should also be released quickly after the query finishes.
The key characteristic of a memory leak is that the software is allocating memory that it
cannot reclaim. If it’s a leak, you ought to be able to reproduce it at any query rate -
have you tried this? A run with, say, half the rate, over twice the duration?

I’m inclined to agree with others here, that although you’ve correctly attributed the
cause to GC, it’s probably less an indication of a leak, and more an indication of simply
allocating memory faster than it can be reclaimed, combined with the long pauses that are
increasingly unavoidable as heap size goes up.
Note that in the case of a CMS allocation failure, the fallback full-GC is *single threaded*,
which means it’ll usually take considerably longer than a normal GC - even for a comparable
amount of garbage.

In addition to GC tuning, you can address these by sharding more, both at the core and jvm
level.


On 12/4/16, 3:46 PM, "Shawn Heisey" <apache@elyograg.org> wrote:

    On 12/3/2016 9:46 PM, S G wrote:
    > The symptom we see is that the java clients querying Solr see response
    > times in 10s of seconds (not milliseconds).
    <snip>
    > Some numbers for the Solr Cloud:
    >
    > *Overall infrastructure:*
    > - Only one collection
    > - 16 VMs used
    > - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
    >
    > *Overview from one core:*
    > - Num Docs:193,623,388
    > - Max Doc:230,577,696
    > - Heap Memory Usage:231,217,880
    > - Deleted Docs:36,954,308
    > - Version:2,357,757
    > - Segment Count:37
    
    The heap memory usage number isn't useful.  It doesn't cover all the
    memory used.
    
    > *Stats from QueryHandler/select*
    > - requests:78,557
    > - errors:358
    > - timeouts:0
    > - totalTime:1,639,975.27
    > - avgRequestsPerSecond:2.62
    > - 5minRateReqsPerSecond:1.39
    > - 15minRateReqsPerSecond:1.64
    > - avgTimePerRequest:20.87
    > - medianRequestTime:0.70
    > - 75thPcRequestTime:1.11
    > - 95thPcRequestTime:191.76
    
    These times are in *milliseconds*, not seconds .. and these are even
    better numbers than you showed before.  Where are you seeing 10 plus
    second query times?  Solr is not showing numbers like that.
    
    If your VM host has 16 VMs on it and each one has a total memory size of
    92GB, then if that machine doesn't have 1.5 terabytes of memory, you're
    oversubscribed, and this is going to lead to terrible performance... but
    the numbers you've shown here do not show terrible performance.
    
    > Plus, on every server, we are seeing lots of exceptions.
    > For example:
    >
    > Between 8:06:55 PM and 8:21:36 PM, exceptions are:
    >
    > 1) Request says it is coming from leader, but we are the leader:
    > update.distrib=FROMLEADER&distrib.from=HOSTB_ca_1_1456430020/&wt=javabin&version=2
    >
    > 2) org.apache.solr.common.SolrException: Request says it is coming from
    > leader, but we are the leader
    >
    > 3) org.apache.solr.common.SolrException:
    > org.apache.solr.client.solrj.SolrServerException: Tried one server for read
    > operation and it timed out, so failing fast
    >
    > 4) null:org.apache.solr.common.SolrException:
    > org.apache.solr.client.solrj.SolrServerException: Tried one server for read
    > operation and it timed out, so failing fast
    >
    > 5) org.apache.solr.common.SolrException:
    > org.apache.solr.client.solrj.SolrServerException: Tried one server for read
    > operation and it timed out, so failing fast
    >
    > 6) null:org.apache.solr.common.SolrException:
    > org.apache.solr.client.solrj.SolrServerException: Tried one server for read
    > operation and it timed out, so failing fast
    >
    > 7) org.apache.solr.common.SolrException:
    > org.apache.solr.client.solrj.SolrServerException: No live SolrServers
    > available to handle this request. Zombie server list:
    > [HOSTA_ca_1_1456429897]
    >
    > 8) null:org.apache.solr.common.SolrException:
    > org.apache.solr.client.solrj.SolrServerException: No live SolrServers
    > available to handle this request. Zombie server list:
    > [HOSTA_ca_1_1456429897]
    >
    > 9) org.apache.solr.common.SolrException:
    > org.apache.solr.client.solrj.SolrServerException: Tried one server for read
    > operation and it timed out, so failing fast
    >
    > 10) null:org.apache.solr.common.SolrException:
    > org.apache.solr.client.solrj.SolrServerException: Tried one server for read
    > operation and it timed out, so failing fast
    >
    > 11) org.apache.solr.common.SolrException:
    > org.apache.solr.client.solrj.SolrServerException: Tried one server for read
    > operation and it timed out, so failing fast
    >
    > 12) null:org.apache.solr.common.SolrException:
    > org.apache.solr.client.solrj.SolrServerException: Tried one server for read
    > operation and it timed out, so failing fast
    
    These errors sound like timeouts, possibly caused by long GC pauses ...
    but as already mentioned, the query handler statistics do not indicate
    long query times.  If a long GC were to happen during a query, then the
    query time would be long as well.
    
    The core information above doesn't include the size of the index on
    disk.  That number would be useful for telling you whether there's
    enough memory.
    
    As I said at the beginning of the thread, I haven't seen anything here
    to indicate a memory leak, and others are using version 4.10 without any
    problems.  If there were a memory leak in a released version of Solr,
    many people would have run into problems with it.
    
    Thanks,
    Shawn
    
    

Mime
View raw message