ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Izhikov <nizhi...@apache.org>
Subject Re: [IEP-35] Monitoring & Profiling. Proof of concept
Date Tue, 14 May 2019 13:42:53 GMT
Ticket for IEP.Phase1 created - https://issues.apache.org/jira/browse/IGNITE-11848


В Пн, 13/05/2019 в 18:06 +0300, Nikolay Izhikov пишет:
> Hello, Igniters.
> 
> We have discussed this IEP [1] with Alexey Goncharyuk, Anton Vinogradov, Andrey Gura,
Alexey Scherbakov and Pavel Kovalenko.
> 
> Issues to address:
> 
> 1. Study experience of following libs, tools:
> 	* OpenTracing
> 	* OpenSensus
> 	* DropWizard
> 
> 2. Support histogram sensor: Sensor that collects values that gets into predefined segments

> 
> 3. Use more widely used naming(like in OpenSensus?) 
> 
> 4. Consider the usage of OpenSensus as a default implementation for local metric storage.
> 
> 5. To measure the performance penalty for metrics for 5_000 caches.
> 
> 6. Some metrics should be part of public API and others are not(may be changed/removed
in release without warnings).
> 
> My plan for Phase #1 is the following:
> 
> 1. Address the issues.
> 2. Prepare public API
> 3. Prepare PR for monitoring subsystem + existing metrics rewritten with it.
> 4. Prepare a PR with lists of each user API.
> 5. Collect feedback for a #4.
> 6. Design a log exposer. Consider the usage of JFR format or some other widely used,
tool compatible format.
> 
> [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
> 
> В Чт, 02/05/2019 в 14:02 +0300, Nikolay Izhikov пишет:
> > Hello, Maxim.
> > 
> > > How will be recorded throughput sensor values which will require an interval
for the rate calculations?
> > 
> > I answered to this question in IEP "Design principles":
> > 
> > ```
> > Sensors should contain only raw values. No aggregation of numeric metrics on Ignite
side. 
> > Min, max, avg and other functions are the matter of an external monitoring system.
> > ```
> > 
> > Throughput is a function `(S(t2) - S(t1))/(t2-t1)`
> > where S(t) is the sensor value in some point of time t.
> > 
> > Seems, throughput calculation is a responsibility of an external system.
> > 
> > What do you think?
> > 
> > > It seems to me that we can add an additional parameter of `sensitivityLevel`
to provide for the user a flexible sensor control (e.g., INFO, WARN, NOTICE, DEBUG).
> > 
> > For now, I think that all sensors and lists will be very(very!) lightweight.
> > So, we should be able to disable/enable it's, for sure.
> > 
> > But, we should turn off and turn on the whole Ignite subsystem 
> > for the case we have strong performance limitations for a particular workload.
> > 
> > So, we have two "level" of monitoring - INFO and DEBUG(for profiling: IEP-35 - Phase
3).
> > For example, AFAIK we can't disable current SQL system views(Why should we?)
> > 
> > В Вт, 30/04/2019 в 14:33 +0300, Maxim Muzafarov пишет:
> > > Hello Nikolay,
> > > 
> > > I've looked through your PRs changes.
> > > 
> > > > Sensors
> > > 
> > > How will be recorded throughput sensor values which will require an
> > > interval for the rate calculations? Do we have such an example? For
> > > instance, getAllocationRate() or getEvictionRate(). These metrics are
> > > out of the scope of current PoC and IEP as they are not related to the
> > > user metrics, but it is a good example of a particular metric type.
> > > 
> > > It seems to me that we can add an additional parameter of
> > > `sensitivityLevel` to provide for the user a flexible sensor control
> > > (e.g., INFO, WARN, NOTICE, DEBUG).
> > > 
> > > It also seems that for the sensors getValue() the completely
> > > functional java approach can be used. Am I right?
> > > 
> > > On Mon, 29 Apr 2019 at 11:44, Nikolay Izhikov <nizhikov@apache.org> wrote:
> > > > 
> > > > Hello, Vyacheslav.
> > > > 
> > > > Thanks for the feedback!
> > > > 
> > > > > HttpExposer with Jetty's dependencies should be detached> from
the core module.
> > > > 
> > > > Agreed. module hierarchy is the essence of the next steps.
> > > > For now it just a proof of my ideas for Ignite monitoring we can discuss.
> > > > 
> > > > > I like your approach with 'wrapper' for monitored objects, like don't
like using 'ServiceConfiguration' directly as a monitored object for services
> > > > 
> > > > Agreed in general.
> > > > Seems, choosing the right data to expose is the matter of separate discussion
for each Ignite entities.
> > > > I've planned to file tickets for each entity so anyone interested can
share his vision in it.
> > > > 
> > > > > In my opinion, each sensor should have a timestamp.
> > > > 
> > > > I'm not sure that *every* sensor should have directly associated timestamp.
> > > > Seems, we should support sensors without timestamp for a current monitoring
numbers at least.
> > > > 
> > > > > Also, it'd be great to have an ability to store a list of a fixed
size> of last N sensors
> > > > 
> > > > What use-cases do you know for such sensors?
> > > > We have plans to support fixed size lists to show "Last N SQL queries"
or similar data.
> > > > Essentially, a sensor is just a single value with the name and known meaning.
> > > > 
> > > > > It'd be great if you provide a more extended test to show the work
of> the system.
> > > > 
> > > > Sorry, for that :)
> > > > When you run 'MonitoringSelfTest' you should open http://localhost:8080/ignite/monitoring
to view exposed info.
> > > > I provide this info in gist - https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d
> > > > 
> > > > I will extend this test to print results to console in the next iterations
- stay tuned :)
> > > > 
> > > > В Вс, 28/04/2019 в 23:35 +0300, Vyacheslav Daradur пишет:
> > > > > Hi, Nikolay,
> > > > > 
> > > > > I looked through PR and IEP, and I have some comments:
> > > > > 
> > > > > It would be better to implement it as a separate module, I can't
say
> > > > > if it is possible for the main part of monitoring or not, but I
> > > > > believe that HttpExposer with Jetty's dependencies should be detached
> > > > > from the core module.
> > > > > 
> > > > > I like your approach with 'wrapper' for monitored objects, like
> > > > > 'ComputeTaskInfo' in PR, and don't like using 'ServiceConfiguration'
> > > > > directly as a monitored object for services. I believe we shouldn't
> > > > > mix approaches. It'd be better always use some kind of container
with
> > > > > monitored object's information to work with such data.
> > > > > 
> > > > > In my opinion, each sensor should have a timestamp. Usually monitoring
> > > > > systems aggregate data and build graphics according to sensors
> > > > > timestamp.
> > > > > 
> > > > > Also, it'd be great to have an ability to store a list of a fixed
size
> > > > > of last N sensors, not to miss them without pushing to an external
> > > > > monitoring system.
> > > > > 
> > > > > It'd be great if you provide a more extended test to show the work
of
> > > > > the system. Everybody who looks to PR needs to run the test and get
> > > > > the info manually to see the completeness of sensors, this might
be
> > > > > simplified by proper test.
> > > > > 
> > > > > Thank you!
> > > > > 
> > > > > 
> > > > > 
> > > > > On Fri, Apr 26, 2019 at 5:56 PM Nikolay Izhikov <nizhikov@apache.org>
wrote:
> > > > > > 
> > > > > > Hello, Igniters.
> > > > > > 
> > > > > > I've prepared Proof of Concept for IEP-35 [1]
> > > > > > PR can be found here - https://github.com/apache/ignite/pull/6510
> > > > > > 
> > > > > > I've done following changes:
> > > > > > 
> > > > > >         1. `GridMonitoringManager`  [2] - simple implementation
of manager to store all monitoring info
> > > > > >         2. `HttpPullExposerSpi` [3] - pull exposer implementation
that can respond with JSON from http://localhost:8080/ignite/monitoring. JSON content can
be veiwed in gist [4]
> > > > > >         3. Compute task start and finish monitoring in "compute"
list [5]
> > > > > >         4. Service registration are monitored in "service" list
- [6]
> > > > > >         5. Current `IgniteSpiMBeanAdapter` rewritten using `GridMonitoringManager`
[7]
> > > > > > 
> > > > > > Design principles, monitoring subsystem details and new Ignite
entities can be found in IEP [1].
> > > > > > 
> > > > > > My next steps will be:
> > > > > > 
> > > > > >         1. Implementation of JMX exposer
> > > > > >         2. Registration of all "lists" and "sensor groups" as
a SQL System view.
> > > > > >         3. Add monitoring for all unmonitoring Ignite API. (described
in IEP).
> > > > > >         4. Rewrite existing jmx metrics using GridMonitoringManager.
> > > > > > 
> > > > > > Please, share you thoughts.
> > > > > > 
> > > > > > Part of JSON file:
> > > > > > ```
> > > > > >     "COMPUTE": {
> > > > > >       "tasks": {
> > > > > >         "name": "tasks",
> > > > > >         "rows": [
> > > > > >           {
> > > > > >             "id": "0798817a-eeec-4386-9af7-94edb39ffced",
> > > > > >             "sessionId": "a1814f95a61-912451ff-ca7b-4764-a7fd-728f6a900000",
> > > > > >             "data": {
> > > > > >               "taskClasName": "org.apache.ignite.monitoring.MonitoringSelfTest$$Lambda$145/1500885480",
> > > > > >               "startTime": 1556287337944,
> > > > > >               "timeout": 9223372036854776000,
> > > > > >               "execName": null
> > > > > >             },
> > > > > >             "name": "anotherBroadcast"
> > > > > >           }
> > > > > > ```
> > > > > > 
> > > > > > [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=112820392
> > > > > > [2] https://github.com/apache/ignite/pull/6510/files#diff-ec7d5cf5e35b99303deb9accee153c50R34
> > > > > > [3] https://github.com/apache/ignite/pull/6510/files#diff-32239c45e0ae3b692af2eae7078e1436R47
> > > > > > [4] https://gist.github.com/nizhikov/aa1e6222e6a3456472b881b8deb0e24d
> > > > > > [5] https://github.com/apache/ignite/pull/6510/files#diff-d651ed29d07bd0c5ce291654a3254cc0R749
> > > > > > [6] https://github.com/apache/ignite/pull/6510/files#diff-0b4e54fbda2b0da1c10eff48416336f6R1606
> > > > > > [7] https://github.com/apache/ignite/pull/6510/files#diff-4398bf118150500e059069b3a1638ec7R61
> > > > > 
> > > > > 
> > > > > 

Mime
View raw message