ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: [DISCUSSION] Performance issue with cluster-wide cache metrics distribution
Date Tue, 04 Dec 2018 12:22:03 GMT
Hi Alex,

Agree with you. Most of the time these distribution of metrics is not
needed. In future we will have more and more information which potentially
needs to be shared between nodes. E.g. IO statistics, SQL statistics for
query optimizer, SQL execution history, etc. We need common mechanics for
this, so I vote for your proposal:
1) Data is collected locally
2) If a node needs to collect data from the cluster, it sends explicit
request over communication SPI
3) For performance reasons we may consider caching - return previously
collected metrics without re-requesting them again if they are not too old

On Tue, Dec 4, 2018 at 12:46 PM Alex Plehanov <plehanov.alex@gmail.com>

> Hi Igniters,
> In the current implementation, cache metrics are collected on each node and
> sent across the whole cluster with discovery message
> (TcpDiscoveryMetricsUpdateMessage) with configured frequency
> (MetricsUpdateFrequency, 2 seconds by default) even if no one requested
> them.
> If there are a lot of caches and a lot of nodes in the cluster, metrics
> update message (which contain each metric for each cache on each node) can
> reach a critical size.
> Also frequently collecting all cache metrics have a negative performance
> impact (some of them just get values from AtomicLong, but some of them need
> an iteration over all cache partitions).
> The only way now to disable cache metrics collecting and sending with
> discovery message is to disable statistics for each cache. But this also
> makes impossible to request some of cache metrics locally (for the current
> node only). Requesting a limited set of cache metrics on the current node
> doesn't have such performance impact as the frequent collecting of all
> cache metrics, but sometimes it's enough for diagnostic purposes.
> As a workaround I have filled and implemented ticket [1], which introduces
> new system property to disable cache metrics sending with
> TcpDiscoveryMetricsUpdateMessage (in case this property is set, the message
> will contain only node metrics). But system property is not good for a
> permanent solution. Perhaps it's better to move such property to public API
> (to IgniteConfiguration for example).
> Also maybe we should change cache metrics distributing strategy? For
> example, collect metrics by request via communication SPI or subscribe to a
> limited set of cache/metrics, etc.
> Thoughts?
> [1]: https://issues.apache.org/jira/browse/IGNITE-10172

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message