lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Persson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-11413) SolrGraphiteReporter fails to report metrics due to non-thread safe code
Date Mon, 09 Oct 2017 21:11:04 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197702#comment-16197702
] 

Erik Persson edited comment on SOLR-11413 at 10/9/17 9:10 PM:
--------------------------------------------------------------

[~ab] Unfortunately the unit test that I had was misleading, so I removed it from the patch.
 It appears to fail 'randomly'.  Specifically it failed when I tried against un-patched Solr
and worked when I tried it against my patch.  Subsequent tests were inconsistent.  

My believe is that the problem lies in the embedded MockGraphite class in SolrGraphiteReporterTest.java.
 I believe that the inconsistency in test results relates to how it handles support concurrent
connections, but as of yet I cannot see why there would be a problem.  I will take another
shot at it. 


was (Author: erikpersson):
[~ab] Unfortunately the unit test that I had was misleading.  It appears to fail 'randomly'.
 Specifically it failed when I tried against un-patched Solr and worked when I tried it against
my patch.  Subsequent tests were inconsistent.  

My believe is that the problem lies in the embedded MockGraphite class in SolrGraphiteReporterTest.java.
 I believe that the inconsistency in test results relates to how it handles support concurrent
connections, but as of yet I cannot see why there would be a problem.  I will take another
shot at it. 

> SolrGraphiteReporter fails to report metrics due to non-thread safe code
> ------------------------------------------------------------------------
>
>                 Key: SOLR-11413
>                 URL: https://issues.apache.org/jira/browse/SOLR-11413
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: metrics
>    Affects Versions: 6.6, 7.0
>            Reporter: Erik Persson
>            Assignee: Andrzej Bialecki 
>         Attachments: SOLR-11413.patch
>
>
> Symptom:
> Intermittent errors writing graphite metrics.  Errors indicate use of sockets which have
already been closed.
> Cause:
> SolrGraphiteReporter caches and shares dropwizard Graphite instances.  These reporters
are not thread safe as they  open and close an instance variable of type GraphiteSender. 
On modern bare metal hardware this problem was observed consistently, and resulted in the
majority of metrics failing to be delivered to graphite.
> Proposed Fix:
> Graphite (and PickledGraphite) are not designed to be cached, and should not be.
> Test:
> Patch file includes test which forces error.
> Alternative Fixes Considered:
> * Totally change solr metrics architecture to use a single metrics registry - seems undesirable
and impractical
> * Create a synchronized or otherwise thread-safe implementation of dropwizard graphite
reporter - should be fixed upstream in dropwizard and not obviously preferred to current model



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message