eagle-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Edward (GDI Hadoop)" <yonzh...@ebay.com>
Subject Re: [Discuss] Hadoop metrics,job,GC monitoring
Date Fri, 08 Jan 2016 06:53:08 GMT
please review latest design of monitoring on hadoop native metrics.



On 12/14/15, 23:48, "Zhang, Edward (GDI Hadoop)" <yonzhang@ebay.com> wrote:

>started some documentation on
>Thanks Hao, Ralph etc. for offline review and suggestions, I would improve
>In terms of the question ┬│if user adds a new metric to monitor, how
>processing layer would change accordingly┬▓
>I think if user adds a new metric, this metric should be added into
>metadata table, and data source layer and processing layer should see
>consistent list of metrics.
>But we still need bake this design, please comment whatever is your
>On 12/14/15, 11:04, "Arun Manoharan" <arunmanoharan@apache.org> wrote:
>>Thanks Edward for starting the thread. I think it is important to have
>>job monitoring (MR/Spark) workloads for performance of the cluster and
>>But it will be beneficial to have an extensible framework where users can
>>create business rules like "I want an alert when NN is in safemode or RM
>>flipping etc".
>>On Mon, Dec 14, 2015 at 10:58 AM, Zhang, Edward (GDI Hadoop) <
>>yonzhang@ebay.com> wrote:
>>> Hi Eagle devs/users,
>>> As proposed in apache eagle incubator proposal, Eagle will start
>>> design/dev to support Hadoop system monitoring besides security
>>> which includes Hadoop native metrics, job, gclog etc.
>>> The community is also interested in Hadoop system monitoring by Eagle
>>> we recently talked about Eagle product in public conferences, meet up
>>> Take Hadoop native metrics as an example, first of all those metrics
>>> pretty valuable in determining system health status, secondly
>>> huge amount metrics, visualizing, and alerting is very challenging.  We
>>> need think of declarative collection, dynamic aggregation, metric
>>> metric query engine etc.
>>> Besides technical design, comprehensive policy/rule are also valuable
>>> be shared in the community. Those policy/rule represent best practice
>>> the world to manage large Hadoop clusters.
>>> Please suggest whatever is for engineering design or business
>>> Thanks
>>> Edward

View raw message