ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhenya Stanilovsky <arzamas...@mail.ru.INVALID>
Subject Re: [DISCUSSION] Performance issue with cluster-wide cache metrics distribution
Date Tue, 04 Dec 2018 12:17:44 GMT
hi, Alex.
1. metrics through discovery require refactoring.
2. local cache metrics should be available (if configured) on each node.
3. there must be an opportunity to configure metrics in runtime.


>Hi Igniters,
>In the current implementation, cache metrics are collected on each node and
>sent across the whole cluster with discovery message
>(TcpDiscoveryMetricsUpdateMessage) with configured frequency
>(MetricsUpdateFrequency, 2 seconds by default) even if no one requested
>If there are a lot of caches and a lot of nodes in the cluster, metrics
>update message (which contain each metric for each cache on each node) can
>reach a critical size.
>Also frequently collecting all cache metrics have a negative performance
>impact (some of them just get values from AtomicLong, but some of them need
>an iteration over all cache partitions).
>The only way now to disable cache metrics collecting and sending with
>discovery message is to disable statistics for each cache. But this also
>makes impossible to request some of cache metrics locally (for the current
>node only). Requesting a limited set of cache metrics on the current node
>doesn't have such performance impact as the frequent collecting of all
>cache metrics, but sometimes it's enough for diagnostic purposes.
>As a workaround I have filled and implemented ticket [1], which introduces
>new system property to disable cache metrics sending with
>TcpDiscoveryMetricsUpdateMessage (in case this property is set, the message
>will contain only node metrics). But system property is not good for a
>permanent solution. Perhaps it's better to move such property to public API
>(to IgniteConfiguration for example).
>Also maybe we should change cache metrics distributing strategy? For
>example, collect metrics by request via communication SPI or subscribe to a
>limited set of cache/metrics, etc.
>[1]:  https://issues.apache.org/jira/browse/IGNITE-10172

Zhenya Stanilovsky
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message