hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Mitic (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-10090) Jobtracker metrics not updated properly after execution of a mapreduce job
Date Thu, 14 Nov 2013 11:03:25 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ivan Mitic updated HADOOP-10090:

    Attachment: HADOOP-690.patch

Hi Chris, Luke,

I am attaching the fix for the #2 approach. The patch includes a unittest that catches the

This fix relaxes the "inconsistent result" issue, however it does not eliminate it completely.
JMX will always return complete result, but the sink might miss some changes. The problem
is that MetricsSourceAdapter must synchronize its calls to MetricsSource#getMetrics, otherwise,
we can have two concurrent threads snapshotting the metrics, leading to one of them not getting
all metrics that had changed (if "all" is not specified, like in the sink case). Because of
the issue described in HADOOP-8050 it is not possible to just add a lock as it can introduce
a deadlock (we would also have to eliminate the system source issue).

Let me know what you think. 

> Jobtracker metrics not updated properly after execution of a mapreduce job
> --------------------------------------------------------------------------
>                 Key: HADOOP-10090
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10090
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 1.2.1
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-690.patch, OneBoxRepro.png
> After executing a wordcount mapreduce sample job, jobtracker metrics are not updated
properly. Often times the response from the jobtracker has higher number of job_completed
than job_submitted (for example 8 jobs completed and 7 jobs submitted). 
> Issue reported by Toma Paunovic.

This message was sent by Atlassian JIRA

View raw message