hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10090) Jobtracker metrics not updated properly after execution of a mapreduce job
Date Thu, 14 Nov 2013 19:45:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822816#comment-13822816

Chris Nauroth commented on HADOOP-10090:

Here is a thought regarding the system source issue and reintroducing synchronization around
{{MetricsSource#getMetrics}} calls.

My understanding of the HADOOP-8050 deadlock is that we had a lock ordering conflict between
a JMX thread (locking {{MetricsSourceAdapter}} and then {{MetricsSystemImpl}}) and a snapshotting
thread (locking {{MetricsSystemImpl}} and then {{MetricsSourceAdapter}}).  HADOOP-8050 resolved
the deadlock by releasing the lock on the {{MetricsSourceAdapter}} before calling {{MetricsSource#getMetrics}}.

What if instead we do the following:

# Change {{MetricsSourceAdapter#getMetrics}} as follows:
  Iterable<MetricsRecordImpl> getMetrics(MetricsBuilderImpl builder,
                                         boolean all) {
    synchronized (source) {
      synchronized (this) {
        // existing method logic here
# Change {{MetricsSystemImpl}} so that it implements {{MetricsSource}} directly instead of
using an anonymous inner class.

The first part synchronizes {{getMetrics}} calls using a locking order that's consistent with
the snapshotting threads.  The second part is required so that the first part's synchronization
on the source is really synchronizing on the {{MetricsSystemImpl}} instance instead of the
separate anonymous inner class instance.

> Jobtracker metrics not updated properly after execution of a mapreduce job
> --------------------------------------------------------------------------
>                 Key: HADOOP-10090
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10090
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: metrics
>    Affects Versions: 1.2.1
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-10090.branch-1.patch, OneBoxRepro.png
> After executing a wordcount mapreduce sample job, jobtracker metrics are not updated
properly. Often times the response from the jobtracker has higher number of job_completed
than job_submitted (for example 8 jobs completed and 7 jobs submitted). 
> Issue reported by Toma Paunovic.

This message was sent by Atlassian JIRA

View raw message