hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Mitic (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-10090) Jobtracker metrics not updated properly after execution of a mapreduce job
Date Mon, 11 Nov 2013 15:02:17 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ivan Mitic updated HADOOP-10090:

    Attachment: OneBoxRepro.png

I was able to repro the problem by looking at JMX metrics thru jconsole. See attached screenshot.

The bug appears to be coming from how the JMX metrics cache is managed in MetricsSourceAdapter.java.
To be more specific:
a) MetricsSourceAdapter keeps track of the latest metrics records obtained from the call into
MetricsSource#getMetrics (see MetricsSourceAdapter#getMetrics)
b) Metrics system timer thread which periodically polls on all sources, only polls for the
metrics that have changed (not all metrics)

Scenario from this bug (JobTracker metrics):
 - Let's assume that the timer runs once per minute
 - At time 0, there were 0 jobs_submitted and 0 jobs_completed, and internal JMX cache reflects
this state
 - At time t1, MetricsSystem timer thread runs and there were 1 jobs_submitted, and 0 jobs_completed.
Internal JMX cache is not updated.
 - At time t1 + 0.5 minutes, above job is completed, and jobs_completed counter is set to
 - At time t1 + 1 minute, MetricsSystem timer thread runs, and only jobs_completed metric
changes from 0 to 1. Internal MetricsSourceAdapter {{lastRecs}} contains only jobs_completed
 - At time t1 + 1 minute + delta, JMX is polled for the current metrics, the current cache
is updated using the current {{lastRecs}}, and the final outcome is jobs_submitted = 0, jobs_completed
= 1.

Since it is not obvious to me what is the intended behavior around metrics reported thru JMX
is, I see two possible directions for the fix:
# JMX always reports the latest snapshot obtained by the MetricsSystem timer thread. This
means that JMX does not report the fresh/current metrics state. Each time metrics are fetched
from the timer thread, internal MetricsSourceAdapter JMX cache would be refreshed to keep
the state in sync.
# JMX always reports the current metric values (with small TTL on the internal cache). In
this design, MetricsSourceAdapter would (almost) always have to poll on the source for JMX.
Additionally, there would have to be means to mark the source as dirty internally when this
happens, to guarantee that the next time timer thread polls for metrics, all metrics are returned
(see b. from above).

Option 2 is definitely nicer from users perspective as JMX reflects the current in-memory
state of servers. 

Please comment back with your thoughts/recommendations.

> Jobtracker metrics not updated properly after execution of a mapreduce job
> --------------------------------------------------------------------------
>                 Key: HADOOP-10090
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10090
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 1.2.1
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: OneBoxRepro.png
> After executing a wordcount mapreduce sample job, jobtracker metrics are not updated
properly. Often times the response from the jobtracker has higher number of job_completed
than job_submitted (for example 8 jobs completed and 7 jobs submitted). 
> Issue reported by Toma Paunovic.

This message was sent by Atlassian JIRA

View raw message