ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Gura <ag...@apache.org>
Subject [IEP-35] Metrics management in Ignite
Date Mon, 26 Aug 2019 17:36:25 GMT
Hi, Igniters!

I'm working on some metrics related aspects and it seems that we have
missed some points regarding the metrics management. There are at
least two major problems from my point of view.

Problem definition.

1. There is no unified approach to adding metrics during development.

It would be grate to have one common place where developer should add
new metrics or can find answer for question "What kind of metrics do
we have for smth?". I talk only about Ignite internal metrics, not
user's metrics.

Now the logic is spread throughout the code base and there is now any
way to find proper place where particular metric could be added to the
registry. Moreover, we can create registry in one place and add some
set of metrics to the registry in another place and we have many
examples of it.

Of course, metrics API allows it and it is meaningful drawback from my
point view. MetricRegistry is mutable and could be modified at any
place of the code and any time of program execution. It  allows
developer follow the simplest way which is usually wrong. Also
GridMetricManager.registry() method has two responsibilities: it
creates and lookups registries. When I see registry() method call in
the code I can't assume what exactly intention had developer (is it
first creation of registry or modification of existing registry).

2. There is no unified approach to enabling/disabling metrics.

Here I mean both problems: runtime enabling/disabling and code design
that should allow do it in proper way. There is JIRA issue [1] about

Now some metrics can be enabled/disabled in static configuration and
at runtime (cache metrics for example) and other don't have such
functionality. It should be unified. Any metrics set (eq.
MetricRegistry in current implementation) should have one unified
approach for enabling/disabling. Also we should provide infrastructure
for it as part of metrics framework.

Proposed solution.

1. Introduce new interface, e.g. MetriсSource that is responsible for
metrics life cycle for one metrics source (e.g. cache metrics, system
metrics, etc.) Each component with metrics should register own
implementation of this interface in metrics manager

Interface declares the following methods:

- String name() - returns sources name that will be available in
special MBean responsible for access for metrics manager.

- MetricRegistry enable() - returns MetricRegistry instance with
allready registered metrics. Metric registry has the same name as name
returned by MetricSource.name() method. This method can be invoked on
node start (if metrics are statically enabled) or as result of metric
enabling via MBean. As result all instance of MetricSource will have
all metric instances initialized and MetricRegistry must be added to
the registries map by GridMetricManager.

- void disable() - Should be invoked after removing corresponding
MetricRegistry from GridMetricManager.registries. The responsibility
is destroying of all metric instances (just assign null). This method
can be invoked during stopping component (e.g. cache stop, index
removing) or as result of metric disabling via MBean.

- boolean enabled() - Actual state for current implemntation.

NOTE 1: Adding/removing metric registries in GridMetricManager should
be performed with copy-on-write semantic in order to avoid possible
data races in related threads (e.g. exporters). It also allows avoid
data copy during exporting (just use link to the registries). Also we
can avoid using ConcurrentHashMap in such case. But of course
additional copies will be created per each registries map
modification. But we assume that it isn't frequent operation.

NOTE 2: MetricSource interface specific implementation is also
responsible for keeping direct references to metric instances and also
for functionality related with metric values calculation (including
checking whether metrics are enabled or not).

NOTE 3: GridMetricManager is responsible for
registration/deregistration of MBeans corresponding to each metrics
source. GridMetricManager is responsible for proper actions sequence
(e.g. we should first create metric registry and after it add it to
registries collection).

At this point we have:

- Way to find out all metrics in the system just navigating throughout
all implementations of MetricSource interface.
- Unified approach to metric creation.
- Unified approach to metric enabling/disabling at runtime.

2. Introduce MetricRegistryBuilder that returns immutable instance of
MetricRegistry. It will force developer to write code related with
metric creation at exactly one place. It also allows to avoid
MetricRegistry modification at runtime. Also we should remove
GridMetricRegistry.registry() method and introduce two new methods:
register(MetricRegistry) and unregister(String registryName). Each
method should assert that no old value for register() case or registry
was deleted for unregister case (we have some problems with it now,
e.g. for cache with near cache configuration).

At this point we have some additional guaranties of proper metrics
framework usage. Such design also will allow optimize metric exporters
because it can sure that metric registry will not modified at runtime
with out removing and conseqent adding registry to the
GridMetricManager.registires map.

3. Introduce MetricManagementMBean that returns list of all metric
sources with boolean flag that indicates whether metrics are enabled
or disabled for this source. List of all sources should be supported
by GridMetricManager (so metricSources collection should be added and

This MBean also should provide methods for enabling and disabling metrics.

At this point we have interface for enabling/disabling metrics via JMX
and basis for implementation other tools (e.g. we can manage metrics
via control.sh after additional development).

All current metrics related code should be refactored accordingly this
proposal and it is most complex and time consuming part.

I'm ready to do all this things. Also anybody can implement and contribute this.

But first I would like to hear your thoughts and ideas.


[1] https://issues.apache.org/jira/browse/IGNITE-11927

View raw message