lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: solr 4.10.3 and index.xxxxxxxxxxx directory
Date Wed, 01 Apr 2015 18:04:14 GMT
I _really_ suspect that with the huge JVM heaps you had, you were hitting long
GC pauses that exceeded the Zookeeper timeout, causing ZK to believe the
node had gone away thus throwing it into recovery mode.

You can enable GC logging to see whether you see such long pauses, but with 96G
it's almost certain that you did.

Reducing the JVM allocation should help, but if you continue to see
nodes go into
recovery for no apparent reason enabling GC logging is a good idea so you have
a record..

See "Getting a view into garbage collection" here:
https://lucidworks.com/blog/garbage-collection-bootcamp-1-0/

Best
Erick

On Wed, Apr 1, 2015 at 10:35 AM, Dominique Bejean
<dominique.bejean@eolya.fr> wrote:
> Hi Shawn,
>
> Thank you for your response.
>
> This is a Solrcloud installation on Centos.
>
> There are 5 servers with 128 Gb ram each.
> The collection contains 650 millions of small documents.
> There are 3 shards with replicationfactor = 2 (so 9 cores).
> The JVM Xmx parameter was set to 96 Gb. We changed it yesterday to 32 Gb in
> order to be under the CompressedOops limit and free the direct memory for
> MMapDirectory.
>
> I will have access to both full solr and tomcat logs tomorrow.
>
> What I know, is that there are some zookeeper time out in solr logs.
> And the replications occur on some nodes after some commits (after DIH
> import) and when nodes restart.
>
> So, I will have more precise log messages tomorrow.
>
> Thank you for your response.
>
> Dominique
>
>
>
> 2015-04-01 18:29 GMT+02:00 Shawn Heisey <apache@elyograg.org>:
>
>> On 4/1/2015 6:35 AM, Dominique Bejean wrote:
>> > Is it normal with Solr 4.10.3 that the data directory of replicas still
>> > contains directories like
>> >
>> > index.3636365667474747
>> > index.9990809809888876
>> >
>> > and files
>> >
>> > index.properties
>> > replica.properties
>> >
>> > If yes, why and in which circumstances ?
>>
>> The index.nnnnnnnnnnnnnnnn directories are created during master/slave
>> index replication.  If you're running SolrCloud, then replication is
>> only used for index recovery.  Index recovery is only required in
>> situations where the replicas are so far behind that the transaction log
>> cannot be used to synchronize them, and sometimes happens when a Solr
>> node is restarted.  If SolrCloud index recovery is actually required
>> when you are NOT restarting Solr instances, your index might be having
>> problems.
>>
>> Regardless of whether you're running SolrCloud or not, normally when one
>> of those directories with a numeric suffix is created, it will be
>> changed to "index" with no suffix after the replication is complete, but
>> if Solr is unable to change the directories for some reason, it will
>> simply keep and use the new directory with the suffix.  Do you see any
>> ERROR or WARN entries in your solr logfile that would indicate why Solr
>> cannot change the directory name?  Are you on Windows?  Problems like
>> this are more common on Windows, because Windows prevents a lot of file
>> operations when files/directories are open.
>>
>> The long-term existence of directories with this naming convention
>> indicates that *something* went wrong, but you would need to consult
>> your logs to find out what happened.  There have been several bugs over
>> Solr's history that cause this problem.
>>
>> Thanks,
>> Shawn
>>
>>

Mime
View raw message