lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Bell <billnb...@gmail.com>
Subject Re: Memory leak in Solr
Date Wed, 07 Dec 2016 08:28:41 GMT
What do you mean by JVM level? Run Solr on different ports on the same
machine? If you have a 32 core box would you run 2,3,4 JVMs?

On Sun, Dec 4, 2016 at 8:46 PM, Jeff Wartes <jwartes@whitepages.com> wrote:

>
> Here’s an earlier post where I mentioned some GC investigation tools:
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
> 201604.mbox/%3C8F8FA32D-EC0E-4352-86F7-4B2D8A906903@whitepages.com%3E
>
> In my experience, there are many aspects of the Solr/Lucene memory
> allocation model that scale with things other than documents returned.
> (such as cardinality, or simply index size) A single query on a large index
> might consume dozens of megabytes of heap to complete. But that heap should
> also be released quickly after the query finishes.
> The key characteristic of a memory leak is that the software is allocating
> memory that it cannot reclaim. If it’s a leak, you ought to be able to
> reproduce it at any query rate - have you tried this? A run with, say, half
> the rate, over twice the duration?
>
> I’m inclined to agree with others here, that although you’ve correctly
> attributed the cause to GC, it’s probably less an indication of a leak, and
> more an indication of simply allocating memory faster than it can be
> reclaimed, combined with the long pauses that are increasingly unavoidable
> as heap size goes up.
> Note that in the case of a CMS allocation failure, the fallback full-GC is
> *single threaded*, which means it’ll usually take considerably longer than
> a normal GC - even for a comparable amount of garbage.
>
> In addition to GC tuning, you can address these by sharding more, both at
> the core and jvm level.
>
>
> On 12/4/16, 3:46 PM, "Shawn Heisey" <apache@elyograg.org> wrote:
>
>     On 12/3/2016 9:46 PM, S G wrote:
>     > The symptom we see is that the java clients querying Solr see
> response
>     > times in 10s of seconds (not milliseconds).
>     <snip>
>     > Some numbers for the Solr Cloud:
>     >
>     > *Overall infrastructure:*
>     > - Only one collection
>     > - 16 VMs used
>     > - 8 shards (1 leader and 1 replica per shard - each core on separate
> VM)
>     >
>     > *Overview from one core:*
>     > - Num Docs:193,623,388
>     > - Max Doc:230,577,696
>     > - Heap Memory Usage:231,217,880
>     > - Deleted Docs:36,954,308
>     > - Version:2,357,757
>     > - Segment Count:37
>
>     The heap memory usage number isn't useful.  It doesn't cover all the
>     memory used.
>
>     > *Stats from QueryHandler/select*
>     > - requests:78,557
>     > - errors:358
>     > - timeouts:0
>     > - totalTime:1,639,975.27
>     > - avgRequestsPerSecond:2.62
>     > - 5minRateReqsPerSecond:1.39
>     > - 15minRateReqsPerSecond:1.64
>     > - avgTimePerRequest:20.87
>     > - medianRequestTime:0.70
>     > - 75thPcRequestTime:1.11
>     > - 95thPcRequestTime:191.76
>
>     These times are in *milliseconds*, not seconds .. and these are even
>     better numbers than you showed before.  Where are you seeing 10 plus
>     second query times?  Solr is not showing numbers like that.
>
>     If your VM host has 16 VMs on it and each one has a total memory size
> of
>     92GB, then if that machine doesn't have 1.5 terabytes of memory, you're
>     oversubscribed, and this is going to lead to terrible performance...
> but
>     the numbers you've shown here do not show terrible performance.
>
>     > Plus, on every server, we are seeing lots of exceptions.
>     > For example:
>     >
>     > Between 8:06:55 PM and 8:21:36 PM, exceptions are:
>     >
>     > 1) Request says it is coming from leader, but we are the leader:
>     > update.distrib=FROMLEADER&distrib.from=HOSTB_ca_1_
> 1456430020/&wt=javabin&version=2
>     >
>     > 2) org.apache.solr.common.SolrException: Request says it is coming
> from
>     > leader, but we are the leader
>     >
>     > 3) org.apache.solr.common.SolrException:
>     > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
>     > operation and it timed out, so failing fast
>     >
>     > 4) null:org.apache.solr.common.SolrException:
>     > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
>     > operation and it timed out, so failing fast
>     >
>     > 5) org.apache.solr.common.SolrException:
>     > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
>     > operation and it timed out, so failing fast
>     >
>     > 6) null:org.apache.solr.common.SolrException:
>     > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
>     > operation and it timed out, so failing fast
>     >
>     > 7) org.apache.solr.common.SolrException:
>     > org.apache.solr.client.solrj.SolrServerException: No live
> SolrServers
>     > available to handle this request. Zombie server list:
>     > [HOSTA_ca_1_1456429897]
>     >
>     > 8) null:org.apache.solr.common.SolrException:
>     > org.apache.solr.client.solrj.SolrServerException: No live
> SolrServers
>     > available to handle this request. Zombie server list:
>     > [HOSTA_ca_1_1456429897]
>     >
>     > 9) org.apache.solr.common.SolrException:
>     > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
>     > operation and it timed out, so failing fast
>     >
>     > 10) null:org.apache.solr.common.SolrException:
>     > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
>     > operation and it timed out, so failing fast
>     >
>     > 11) org.apache.solr.common.SolrException:
>     > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
>     > operation and it timed out, so failing fast
>     >
>     > 12) null:org.apache.solr.common.SolrException:
>     > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
>     > operation and it timed out, so failing fast
>
>     These errors sound like timeouts, possibly caused by long GC pauses ...
>     but as already mentioned, the query handler statistics do not indicate
>     long query times.  If a long GC were to happen during a query, then the
>     query time would be long as well.
>
>     The core information above doesn't include the size of the index on
>     disk.  That number would be useful for telling you whether there's
>     enough memory.
>
>     As I said at the beginning of the thread, I haven't seen anything here
>     to indicate a memory leak, and others are using version 4.10 without
> any
>     problems.  If there were a memory leak in a released version of Solr,
>     many people would have run into problems with it.
>
>     Thanks,
>     Shawn
>
>
>
>


-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message