ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [ambari] payert opened a new pull request #3225: AMBARI-25549 NegativeArraySizeException thrown when invoking CurrentCollectorHost
Date Wed, 09 Sep 2020 15:22:29 GMT

payert opened a new pull request #3225:
URL: https://github.com/apache/ambari/pull/3225


   AbstractTimelineMetricsSink is used in multi-threaded manner by Streams Messaging Manager
application to fetch and push metrics to Ambari Metric Collector. 
   When all the AMS live nodes are down(http://localhost:6188/ws/v1/timeline/metrics/livenodes),
the method getCurrentCollectorHost throws NegativeArraySizeException.
   
   `java.lang.NegativeArraySizeException: null at java.util.AbstractCollection.toArray(AbstractCollection.java:136)
at java.util.ArrayList.<init>(ArrayList.java:178) at org.apache.hadoop.metrics2.sink.timeline.AbstractTimelineMetricsSink$1.get(AbstractTimelineMetricsSink.java:460)
at org.apache.hadoop.metrics2.sink.timeline.AbstractTimelineMetricsSink$1.get(AbstractTimelineMetricsSink.java:450)
at org.apache.hadoop.metrics2.sink.relocated.google.common.base.Suppliers$ExpiringMemoizingSupplier.get(Suppliers.java:192)
at org.apache.hadoop.metrics2.sink.timeline.AbstractTimelineMetricsSink.getCurrentCollectorHost(AbstractTimelineMetricsSink.java:264)
at com.hortonworks.smm.kafka.services.metric.ams.AMSMetricsFetcher.getCollectorAPIUri(AMSMetricsFetcher.java:231)
`
   
   ## What changes were proposed in this pull request?
   The `allKnownLiveCollectors` `Set` that keeps tracking the live hosts should be protected
from concurrent access because concurrent `remove()` call can result in negative array size.
   Also the null checking and instantiation of `targetCollectorHostSupplier` should be protected
since tests revealed occasional NullPointerExceptions when dereferencing the variable.
   
   The proposed solution is synchronising some parts of `getCurrentCollectorHost()`.
   The implementation is using a `Supplier` where the `get()` method is synchronised under
the hood and the synchronising object is the `delegate` object. For this reason I have chosen
the `collectorShardSupplier` object as synchronising object within the class because it is
the same object that `get() `is using.
   
   ## How was this patch tested?
   - Unit test has been added that exercises the `AbstractTimelineMetricsSink` in a concurrent
way and is able to reproduce the NegativeArraySize exception.
   - Testing on live cluster is in progress. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message