flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.
Date Thu, 28 Apr 2016 14:21:13 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262213#comment-15262213

ASF GitHub Bot commented on FLINK-1502:

GitHub user zentol opened a pull request:


    [FLINK-1502] Basic Metric System

    This PR is a preview of the new metric system. 
    It is not complete because
    * there is no documentation for the website
    * a few smaller parts also don't have code documentation
    * I haven't tried out the ganglia/statsD reporter yet
    In general though it works and it is now time to gather some feedback.
    The PR is organized into several commits to give it some structure; generally divided
by which part of the system they expose the metric system to. Note that  The last commit "Metric
Usage Examples" is not technically part of the PR but showcases the usage.
    The division was done very simple, so some changes may technically belong to several commits.
    ## General overview
    A user can access a system-provided MetricGroup to register a Metric, which is stored
in a MetricRegistry and forwarded regularly to a Reporter which communicates them to an external
    ## MetricGroups
    MetricGroups are the user-facing part of the system. They are a nested data structure,
containing other groups and metrics, that allow registering metrics with Flink while organizing
them in a hierarchy.
    For example, every TaskManager has a MetricGroup, and for every task that is deployed
a new sub-group for that task is added. This task specific group is propagated through the
task stack, with new groups/metrics being added. Within a UDF the operator MetricGroup is
accessed through the RuntimeContext.
    ## Metrics
    Metrics are the objects used to measure something.
    Metrics include 
    * Gauges, that measure a value on-demand
    * Meters, that measure the rate/count of events
    * Histograms, that measure the distribution of long values
    * Counters, that count stuff
    * Timers, that measure rate of calls and distribution of execution time for a given piece
of code.
    Under the hood we use the Metrics from the Dropwizard library. In order to ensure interface
stability, and to give us the option to reimplement things without breaking everything, they
(and other classes) are wrapped to match our interfaces. 
    ## Reporters
    Reporters are the component that communicate the Metrics to the outside world. With this
PR we allow exporting Metrics via JMX (default), Graphite, Ganglia and StatsD. They interval
in which they report is configurable.
    Similarly to Metrics, we partially use reporters from the DropWizard library (Graphite,
Ganglia), again wrapped to match out interfaces.
    Reporters are configured via flink-conf.yaml.
    An example configuration might look like this:
    metrics.reporter.class: org.apache.flink.metrics.GraphiteReporter
    metrics.reporter.arguments: --host localhost --port 8080
    metrics.reporter.interval: 30 SECONDS
    Reporters are instantiated generically and configured with a Configuration containing
the parsed arguments. All non-JMXReporters are not part of the distribution and have to be
added to the classpath manually (usually by putting the jar into /lib)
    JMX uses the port 9010 by default, This can be configured by setting the metrics.jmx.port
property in the flink-conf.yaml
    ## Registry
    The registry is essentially just a connection between all MetricGroups and the Reporter.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink metrics_v2

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1947
commit b90b53cd73824389b41978f0113ca0c6d3da1422
Author: zentol <chesnay@apache.org>
Date:   2016-04-15T13:57:14Z

    Add basic metric structures
    -add dropwizard dependency to flink-core
    -add metric wrappers
    -add metric groups/category organization
    -add metric registry

commit 45e6e123d37a8fba1bf76386a84436e8fb04a9fa
Author: zentol <chesnay@apache.org>
Date:   2016-04-19T11:28:28Z

    Graphite/Ganglia/StatsD Reporters

commit e634060d83f2b475e954c67424ba39e3ffd92b6b
Author: zentol <chesnay@apache.org>
Date:   2016-04-13T16:47:04Z

    Task Integration
    -included job name in TaskDeploymentDescriptor
    -enabled remote JMX for TaskManager
    -added TaskManager status metrics

commit 20ca6c3b19690e08335e31fcf3377f4a511e9b00
Author: zentol <chesnay@apache.org>
Date:   2016-04-13T14:50:16Z

    Environment Integration
    -add MetricGroup field to environment
    -primary location to retrieve tm/task/subtask keyed metricgroup

commit e8eed4d27361ea311dbf9e9694cca70633d5b54e
Author: zentol <chesnay@apache.org>
Date:   2016-04-13T14:23:54Z

    IO Metrics Integration
    -add metrics for records/bytes read/written

commit f47161db1804909f46520844d23a4e3148387f7b
Author: zentol <chesnay@apache.org>
Date:   2016-04-14T10:02:51Z

    Streaming Operator Integration

commit c0c2d967dd53ceac966af4b7400982de5e53a272
Author: zentol <chesnay@apache.org>
Date:   2016-04-13T15:17:15Z

    Batch Operator Integration
    -add getMetricGroup() method to TaskContext for driver access
    -add MetricGroup field to ChainedDriver for chained driver access

commit fa7a8947bde42333748ae02d7c02023f89d20e41
Author: zentol <chesnay@apache.org>
Date:   2016-04-13T14:51:46Z

    Context Integration
    -add getMetricGroup() method to udf-context for udf/IO-format access

commit 9082d0697ad7f5c9146d77c932eb551eabba40ac
Author: zentol <chesnay@apache.org>
Date:   2016-04-13T14:58:38Z

    Metric Usage Examples


> Expose metrics to graphite, ganglia and JMX.
> --------------------------------------------
>                 Key: FLINK-1502
>                 URL: https://issues.apache.org/jira/browse/FLINK-1502
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager, TaskManager
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>            Assignee: Chesnay Schepler
>            Priority: Minor
>             Fix For: pre-apache
> The metrics library allows to expose collected metrics easily to other systems such as
graphite, ganglia or Java's JVM (VisualVM).

This message was sent by Atlassian JIRA

View raw message