ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Mekhanikov <dmekhani...@gmail.com>
Subject Re: [DISCUSSION] Performance issue with cluster-wide cache metrics distribution
Date Tue, 04 Dec 2018 12:13:49 GMT

Did you measure the impact of metrics collection? What is the overhead you
are trying to avoid?

Just to make it clear, MetricUpdateMessage-s are used as heartbeats.
So they are sent anyways, even if no metrics are distributed between nodes.


вт, 4 дек. 2018 г. в 12:46, Alex Plehanov <plehanov.alex@gmail.com>:

> Hi Igniters,
> In the current implementation, cache metrics are collected on each node and
> sent across the whole cluster with discovery message
> (TcpDiscoveryMetricsUpdateMessage) with configured frequency
> (MetricsUpdateFrequency, 2 seconds by default) even if no one requested
> them.
> If there are a lot of caches and a lot of nodes in the cluster, metrics
> update message (which contain each metric for each cache on each node) can
> reach a critical size.
> Also frequently collecting all cache metrics have a negative performance
> impact (some of them just get values from AtomicLong, but some of them need
> an iteration over all cache partitions).
> The only way now to disable cache metrics collecting and sending with
> discovery message is to disable statistics for each cache. But this also
> makes impossible to request some of cache metrics locally (for the current
> node only). Requesting a limited set of cache metrics on the current node
> doesn't have such performance impact as the frequent collecting of all
> cache metrics, but sometimes it's enough for diagnostic purposes.
> As a workaround I have filled and implemented ticket [1], which introduces
> new system property to disable cache metrics sending with
> TcpDiscoveryMetricsUpdateMessage (in case this property is set, the message
> will contain only node metrics). But system property is not good for a
> permanent solution. Perhaps it's better to move such property to public API
> (to IgniteConfiguration for example).
> Also maybe we should change cache metrics distributing strategy? For
> example, collect metrics by request via communication SPI or subscribe to a
> limited set of cache/metrics, etc.
> Thoughts?
> [1]: https://issues.apache.org/jira/browse/IGNITE-10172

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message