lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
Date Thu, 27 Apr 2017 15:54:04 GMT

    [ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986859#comment-15986859
] 

Shawn Heisey commented on SOLR-10130:
-------------------------------------

Have a question related to this issue.  Somebody on the IRC channel running 6.4.2 is seeing
continued performance degradation compared to 4.x.  They were running an earlier 6.4.x release,
until they were advised about this issue.

Looking at the utilization for threads, the top threads on 6.4.2 are all named starting with
qtp, which I believe means they are Jetty threads.

https://gist.github.com/msporleder-work/7313ebedbdab2e178ca0aa2e889d006b

If I'm not mistaken, we enabled container-level metrics with the changes that went into 6.4.0.
 If that's true, do we perhaps have those metrics dialed up to 11?

> Serious performance degradation in Solr 6.4.1 due to the new metrics collection
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-10130
>                 URL: https://issues.apache.org/jira/browse/SOLR-10130
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics
>    Affects Versions: 6.4, 6.4.1
>         Environment: Centos 7, OpenJDK 1.8.0 update 111
>            Reporter: Ere Maijala
>            Assignee: Andrzej Bialecki 
>            Priority: Blocker
>              Labels: perfomance
>             Fix For: 6.4.2, master (7.0)
>
>         Attachments: SOLR-10130.patch, SOLR-10130.patch, solr-8983-console-f1.log
>
>
> We've stumbled on serious performance issues after upgrading to Solr 6.4.1. Looks like
the new metrics collection system in MetricsDirectoryFactory is causing a major slowdown.
This happens with an index configuration that, as far as I can see, has no metrics specific
configuration and uses luceneMatchVersion 5.5.0. In practice a moderate load will completely
bog down the server with Solr threads constantly using up all CPU (600% on 6 core machine)
capacity with a load that normally  where we normally see an average load of < 50%.
> I took stack traces (I'll attach them) and noticed that the threads are spending time
in com.codahale.metrics.Meter.mark. I tested building Solr 6.4.1 with the metrics collection
disabled in MetricsDirectoryFactory getByte and getBytes methods and was unable to reproduce
the issue.
> As far as I can see there are several issues:
> 1. Collecting metrics on every single byte read is slow.
> 2. Having it enabled by default is not a good idea.
> 3. The comment "enable coarse-grained metrics by default" at https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104
implies that only coarse-grained metrics should be enabled by default, and this contradicts
with collecting metrics on every single byte read.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message