ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Goncharuk <alexey.goncha...@gmail.com>
Subject Re: [DISCUSSION][IEP-35] Metrics configuration
Date Wed, 07 Aug 2019 08:32:06 GMT
This is a tough topic, however, in my opinion, static bucket configuration
for histograms will work worse than no configuration at all. The reason is
simple - complexity. If we assume a need to change bucket configuration for
some metric, we can assume that this metric can be related to a cache. Now,
we can assume that different caches require different metric configuration.
Combined with the ability to start dynamic caches, we get no simple way to
change the metric for dynamically started caches.

>From my experience, properly chosen buckets are usually good enough to
troubleshoot most of the cases. For advanced scenarios, I would delay the
configuration exposure and designed a properly structured control.sh
commands which will utilize either metastore or whatever another system we
choose to keep this configuration. Managing configuration files in large
clusters is a big pain, no need to make it worse.

вт, 6 авг. 2019 г. в 15:32, Павлухин Иван <vololo100@gmail.com>:

> Andrey,
>
> It seems that screenshot was rejected, I do not see it.
>
> вт, 6 авг. 2019 г. в 15:07, Andrey Gura <agura@apache.org>:
> >
> > Good example of proper buckets configuration. Such configuration is
> > suitable for may cases. See attached screenshot (I hope it will not be
> > reject by mail system or forum).
> >
> >
> > On Tue, Aug 6, 2019 at 2:45 PM Andrey Gura <agura@apache.org> wrote:
> > >
> > > > What do you mean by "exponential bounds"?
> > >
> > > Something like this if we talk about latency in ms for example: 5, 10,
> > > 25, 50, 100, 200, 500, ...
> > >
> > > > Thanks, for the feedback, appreciate you ownesty.
> > >
> > > Nothing personal. It is just about functionality from user's stand
> point.
> > >
> > > > What is your proposal?
> > > > How metrics configuration should work?
> > >
> > > My proposal is simple: just drop this change. We don't need the
> > > configuration. Metric owner (developer) defines buckets' bounds for
> > > each particular case (it could be done uniformly or exponentially, it
> > > depends on metric and problem definition).
> > >
> > > On Mon, Aug 5, 2019 at 6:36 PM Nikolay Izhikov <nizhikov@apache.org>
> wrote:
> > > >
> > > > Hello, Andrey.
> > > >
> > > > > Not necessary if we have exponential bounds' values for histograms.
> > > >
> > > > What do you mean by "exponential bounds"?
> > > >
> > > > > Anyway, in current solution it looks ugly and not usable.
> > > >
> > > > Thanks, for the feedback, appreciate you ownesty.
> > > >
> > > > > No. But we should admit that this is bad decision and do not
> include this change to the code base.
> > > >
> > > > What is your proposal?
> > > > How metrics configuration should work?
> > > >
> > > > > Yes. But it still will not give enough accuracy.
> > > >
> > > > Enough for what?
> > > >
> > > > В Пн, 05/08/2019 в 18:29 +0300, Andrey Gura пишет:
> > > > > > > - metric configuration is node local (not cluster wide).
> > > > > > This issue is easy to solve on the user-side and in Ignite core.
> > > > >
> > > > > It's imaginary simplicity. The first, you need some additional
> > > > > automation on user-side in order to configure all nodes of the
> > > > > cluster. The second, new nodes can join to the cluster and
> > > > > configuration will be different on new node and on other nodes of
> the
> > > > > cluster. This leads to complication whole functionality. Anyway,
I
> > > > > don't like such simplified solution because at the moment it brings
> > > > > more problems than value.
> > > > >
> > > > > > The easiest solution was implemented.
> > > > > > Do we want to make it more complex right now :)?
> > > > >
> > > > > No. But we should admit that this is bad decision and do not
> include
> > > > > this change to the code base.
> > > > >
> > > > > > The reason it exists in PR - we already have this parameter
in
> DataStorageConfiguration#getMetricsSubIntervalCount
> > > > >
> > > > > I believe this method should be deprecated and removed in major
> release.
> > > > >
> > > > > > I think the user should be able to configure buckets for
> histogram and rateTimeInterval for hitrate.
> > > > >
> > > > > Not necessary if we have exponential bounds' values for histograms.
> > > > > Anyway, in current solution it looks ugly and not usable.
> > > > >
> > > > > > Ignite has dozens of use-cases and deployment modes, seems,
> > > > > > we can't cover it all with the single predefined
> buckets/rateTimeInterval set.
> > > > >
> > > > > Yes. But it still will not give enough accuracy.
> > > > >
> > > > > On Mon, Aug 5, 2019 at 5:25 PM Nikolay Izhikov <
> nizhikov@apache.org> wrote:
> > > > > >
> > > > > > Hello, Andrey.
> > > > > >
> > > > > > > - metric configuration is node local (not cluster wide).
> > > > > >
> > > > > > This issue is easy to solve on the user-side and in Ignite core.
> > > > > >
> > > > > > > - metric configuration doesn't survive node restart.
> > > > > >
> > > > > > We decide to go with the simplest solution, for now.
> > > > > > The easiest solution was implemented.
> > > > > > Do we want to make it more complex right now :)?
> > > > > >
> > > > > > > - User shouldn't configure hit rate metrics at runtime
in most
> cases.
> > > > > >
> > > > > > I agree with you - the size of the counters array looks odd
as a
> configuration parameter.
> > > > > > The reason it exists in PR - we already have this parameter
in
> DataStorageConfiguration#getMetricsSubIntervalCount
> > > > > >
> > > > > > > - May be it is enough for user to have histograms with
> pre-configured buckets
> > > > > > > So I think we should drop this change and idea about runtime
> histrogram and hit rate configuration.
> > > > > >
> > > > > > I think the user should be able to configure buckets for
> histogram and rateTimeInterval for hitrate.
> > > > > >
> > > > > > Ignite has dozens of use-cases and deployment modes, seems,
> > > > > > we can't cover it all with the single predefined
> buckets/rateTimeInterval set.
> > > > > >
> > > > > > В Пн, 05/08/2019 в 16:59 +0300, Andrey Gura пишет:
> > > > > > > Igniters,
> > > > > > >
> > > > > > > I've took a look to the PR and I want follow up this
> discussion again.
> > > > > > >
> > > > > > > Proposed solution has a couple of significant drawbacks:
> > > > > > >
> > > > > > > - metric configuration is node local (not cluster wide).
> > > > > > > - metric configuration doesn't survive node restart.
> > > > > > >
> > > > > > > This drawbacks make configuration complex, annoying and
> useless in most cases.
> > > > > > >
> > > > > > > Moreover, I think that:
> > > > > > >
> > > > > > > - User shouldn't configure hit rate metrics at runtime
in most
> cases.
> > > > > > > Especially HitRateMetric.size because it's just details
of
> > > > > > > implementation. Purpose of size is plots smoothing and
this
> parameter
> > > > > > > could be fixed (e.g. 16 is enough). HitRate metric is just
> LongMetric
> > > > > > > but with additional feature.
> > > > > > > - May be it is enough for user to have histograms with
> pre-configured
> > > > > > > buckets. The trick here is properly chosen bounds. It seems
> that
> > > > > > > exponentially chosen values will fit for most cases. So
we can
> avoid
> > > > > > > runtime configuration for histograms.
> > > > > > > - We can also provide percentile metric for more accurate
> > > > > > > measurements. Yes, it will bring additional performance
impact
> and
> > > > > > > accuracy will not be the best. But it will more clearly
and
> will not
> > > > > > > require configuration.
> > > > > > >
> > > > > > > So I think we should drop this change and idea about runtime
> > > > > > > histrogram and hit rate configuration.
> > > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > On Tue, Jul 9, 2019 at 2:06 PM Nikolay Izhikov <
> nizhikov@apache.org> wrote:
> > > > > > > >
> > > > > > > > Igniters,
> > > > > > > >
> > > > > > > > I made a PR for metrics configuration.
> > > > > > > >
> > > > > > > > Please, review
> > > > > > > >
> > > > > > > > https://github.com/apache/ignite/pull/6676/files
> > > > > > > >
> > > > > > > > В Вт, 09/07/2019 в 12:27 +0300, Nikolay Izhikov
пишет:
> > > > > > > > > Hello, Alex.
> > > > > > > > >
> > > > > > > > > OK, Let's go with the simplest solution.
> > > > > > > > > I will provide API and JMX method for metrics
> configuration shortly.
> > > > > > > > >
> > > > > > > > > В Пн, 08/07/2019 в 18:23 +0300, Alexey Goncharuk
пишет:
> > > > > > > > > > Nikolay,
> > > > > > > > > >
> > > > > > > > > > To me a separate metrics configuration file
seems to be
> not very
> > > > > > > > > > user-friendly. First of all, it does not
allow to
> configure the system only
> > > > > > > > > > from Java code. Second, having multiple
configuration
> files seem to be
> > > > > > > > > > quite confusing for end users (judging by
the logging
> configuration
> > > > > > > > > > questions).
> > > > > > > > > >
> > > > > > > > > > Perhaps, we will still end up with the configuration
> file - but for now I
> > > > > > > > > > would put this aside for a more thorough
brainstorm and
> added the JMX and
> > > > > > > > > > internal API for changing metrics configuration.
> > > > > > > > > >
> > > > > > > > > > пт, 5 июл. 2019 г. в 14:17, Seliverstov
Igor <
> gvvinblade@gmail.com>:
> > > > > > > > > >
> > > > > > > > > > > Igniters,
> > > > > > > > > > >
> > > > > > > > > > > One more question on topic.
> > > > > > > > > > >
> > > > > > > > > > > Should we preserve metrics configuration
on restart?
> (I think we should)
> > > > > > > > > > >
> > > > > > > > > > > If so, which configuration use after
restart? Defined
> in config file or
> > > > > > > > > > > saved in config storage? (I guess,
saved configuration
> should have a
> > > > > > > > > > > priority)
> > > > > > > > > > >
> > > > > > > > > > > So, how to tell users that any changes
in
> configuration file have no
> > > > > > > > > > > effect on Ignite configuration after
first start?
> > > > > > > > > > >
> > > > > > > > > > > I think there are too many open questions
and (at
> least at now) we should
> > > > > > > > > > > provide only JMX API until all of the
questions are
> clarified.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Igor
> > > > > > > > > > >
> > > > > > > > > > > > 4 июля 2019 г., в 19:55,
Nikolay Izhikov <
> nizhikov@apache.org>
> > > > > > > > > > >
> > > > > > > > > > > написал(а):
> > > > > > > > > > > >
> > > > > > > > > > > > Hello, Andrey.
> > > > > > > > > > > >
> > > > > > > > > > > > > 3. I can't imagine that adequate
values will be
> chosen on project
> > > > > > > > > > > > > setup stage.
> > > > > > > > > > > >
> > > > > > > > > > > > Configuration file required in
the case we adds new
> node or replace
> > > > > > > > > > >
> > > > > > > > > > > existing to the cluster.
> > > > > > > > > > > > Use can have parameters similar
to Ignite
> configuration, log
> > > > > > > > > > >
> > > > > > > > > > > configuration files.
> > > > > > > > > > > >
> > > > > > > > > > > > > My proposal is adding API
for boundaries
> configuration to the metrics
> > > > > > > > > > > > > framework and expose it via
JMX
> > > > > > > > > > > >
> > > > > > > > > > > > Agree. I think we should have
both:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Configuration file.
> > > > > > > > > > > > 2. JMX API to change bounaries
of histogram *and
> HitRateMetric params*.
> > > > > > > > > > > >
> > > > > > > > > > > > But, if you and other community
member are against
> config file, let's
> > > > > > > > > > >
> > > > > > > > > > > have only JMX.
> > > > > > > > > > > > Seems, JMX will provide required
level of
> configurability for metrics.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > В Чт, 04/07/2019 в 17:53 +0300,
Andrey Gura пишет:
> > > > > > > > > > > > > Igniters,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I rethought the issue and
I see some problems:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. It seems that in most
cases bucket boundaries
> configuration will be
> > > > > > > > > > > > > problem for user. Absolute
values for latency
> boundaries it is very
> > > > > > > > > > > > > odd choice.
> > > > > > > > > > > > > 2. Also seems that latency
for most caches (if we
> configure cache
> > > > > > > > > > > > > metrics fro example) will
be similar.
> > > > > > > > > > > > > 3. I can't imagine that adequate
values will be
> chosen on project
> > > > > > > > > > > > > setup stage. So chosen values
should be changed in
> the future.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Solution with configuration
file looks unnatural
> and creates more
> > > > > > > > > > > > > problems than could solve.
> > > > > > > > > > > > >
> > > > > > > > > > > > > My proposal is adding API
for boundaries
> configuration to the metrics
> > > > > > > > > > > > > framework and expose it via
JMX (at this step). It
> still provides
> > > > > > > > > > > > > configuration possibility
but don't force user to
> do it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Also we should chose default
values for bucket
> boundaries. And it is
> > > > > > > > > > > > > most complex problem at the
moment :) Let's
> discuss it.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jul 3, 2019 at 4:49
PM Andrey Gura <
> agura@apache.org> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Nikolai,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Metric is disabled if
it doesn't allocate any
> memory and doesn't
> > > > > > > > > > > > > > update any variable
because doesn't have any
> value. Ideally disabling
> > > > > > > > > > > > > > metrics for some cache
should be equal to cache
> stopping.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Jun 28, 2019
at 1:02 PM Nikolay Izhikov <
> nizhikov@apache.org>
> > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hello, Alexey.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the
feedback!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > My only concert
is that we should have the
> metrics framework
> > > > > > > > > > >
> > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > as the first-citizen
of the framework itself
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes. I planned
to add `void configure(String
> param)` method to the
> > > > > > > > > > >
> > > > > > > > > > > metric API.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > but change
the metrics parameters in
> > > > > > > > > > > > > > > > runtime from
JMX or command-line, etc.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I've add requirement
of JMX method to the
> ticket:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> https://issues.apache.org/jira/browse/IGNITE-11927
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Another concern
is to have an
> > > > > > > > > > > > > > > > ability to
disable/enable metrics per
> metrics group/prefix.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, we discusss
it.
> > > > > > > > > > > > > > > But, let's make
it clear:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > *What is disabling
metric?*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Looks like exporter
filter solve this task.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > В Чт, 27/06/2019
в 16:24 +0300, Alexey
> Goncharuk пишет:
> > > > > > > > > > > > > > > > Nikolay,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > My only concert
is that we should have the
> metrics framework
> > > > > > > > > > >
> > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > as the first-citizen
of the framework
> itself. This way, we can
> > > > > > > > > > >
> > > > > > > > > > > configure
> > > > > > > > > > > > > > > > the metrics
not only from file, but change
> the metrics parameters in
> > > > > > > > > > > > > > > > runtime from
JMX or command-line, etc.
> Another concern is to have an
> > > > > > > > > > > > > > > > ability to
disable/enable metrics per
> metrics group/prefix.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The logger-like
configuration meets these
> suggestions given that the
> > > > > > > > > > > > > > > > configuration
is generalized into the
> metrics framework.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > What do you
think?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > чт, 27 июн.
2019 г. в 12:30, Nikolay Izhikov
> <nizhikov@apache.org>:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hello,
Igniters.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > As you
may know, I've contributed Phase1
> [1] for IEP-35 [2].
> > > > > > > > > > > > > > > > > Now we
have metrics subsystem and can
> create and export any metrics
> > > > > > > > > > >
> > > > > > > > > > > from
> > > > > > > > > > > > > > > > > Ignite.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I think
user(administrator of Ignite)
> should be able to configure
> > > > > > > > > > >
> > > > > > > > > > > some
> > > > > > > > > > > > > > > > > metrics
params in a common way [3]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I propose
to use the same way from logging
> frameworks.
> > > > > > > > > > > > > > > > > We should
define some file format Ignite
> can understand.
> > > > > > > > > > > > > > > > > An administrator
fills configuration file
> to configure one or
> > > > > > > > > > >
> > > > > > > > > > > several
> > > > > > > > > > > > > > > > > metrics.
> > > > > > > > > > > > > > > > > Ignite
will analyze the file and use
> provided params during metrics
> > > > > > > > > > > > > > > > > creation.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > For now,
we have 2 types of metrics that
> should be configured:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >     
  *       HistrogramMetric [4]
> > > > > > > > > > > > > > > > >     
          This metric is a count of
> measurement that falls into
> > > > > > > > > > > > > > > > > predefined
intervals.
> > > > > > > > > > > > > > > > >     
          An example is "Request
> processing time distribution".
> > > > > > > > > > > > > > > > >     
          We want to calculate a
> count of requests processed
> > > > > > > > > > >
> > > > > > > > > > > quicker
> > > > > > > > > > > > > > > > > then
50ms, 50-100, 100-250, 250-500 and
> slower.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >     
  *       HitRateMetric [5]
> > > > > > > > > > > > > > > > >     
          This metric is a count of
> events in the last time
> > > > > > > > > > >
> > > > > > > > > > > interval.
> > > > > > > > > > > > > > > > >     
          An example is the "Count of
> requests processed in
> > > > > > > > > > >
> > > > > > > > > > > the last
> > > > > > > > > > > > > > > > > 5 seconds".
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Example
of file content:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ````
> > > > > > > > > > > > > > > > > cache.my-cahe.GetLatency=50,100,250,500
> #Params for the histogram
> > > > > > > > > > >
> > > > > > > > > > > metric
> > > > > > > > > > > > > > > > > with
the name `cache.my-cahe.get`
> > > > > > > > > > > > > > > > > cache.my-cache.RebalancingKeysRate=60000
> #Param for existing
> > > > > > > > > > >
> > > > > > > > > > > HitRateMetric
> > > > > > > > > > > > > > > > > that
hold "Estimated rebalancing speed in
> keys".
> > > > > > > > > > > > > > > > > ````
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Please,
share your vision.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> https://github.com/apache/ignite/commit/fdaa310430aefff07994eb35510d3416886b5bbe
> > > > > > > > > > > > > > > > > [2]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
> > > > > > > > > > > > > > > > > [3]
> https://issues.apache.org/jira/browse/IGNITE-11927
> > > > > > > > > > > > > > > > > [4]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/metric/impl/HistogramMetric.java
> > > > > > > > > > > > > > > > > [5]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/metric/impl/HitRateMetric.java
> > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message