hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laxman (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-6351) Reducer hung in copy phase.
Date Sat, 02 May 2015 16:52:06 GMT
Laxman created MAPREDUCE-6351:

             Summary: Reducer hung in copy phase.
                 Key: MAPREDUCE-6351
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6351
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 2.6.0
            Reporter: Laxman

Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing
this task for couple of times manually, it gets completed. 

- Verfied gc logs. Found no memory related issues. Attache
- Verified thread dumps. Found no thread related problems. 
- On verification of logs, fetcher threads are not copying the map outputs and they are just
waiting for merge to happen.
- Merge thread is alive and in wait state.

On careful observation of logs, thread dumps and code, this looks to me like a classic case
of multi-threading issue. Thread goes to wait state after it has been notified. 

Here is the suspect code flow.

*Thread #1*
Fetcher thread - notification comes first
      synchronized(pendingToBeMerged) {

*Thread #2*
Merge Thread - goes to wait state (Notification goes unconsumed)
        synchronized (pendingToBeMerged) {
          while(pendingToBeMerged.size() <= 0) {
          // Pickup the inputs to merge.
          inputs = pendingToBeMerged.removeFirst();

This message was sent by Atlassian JIRA

View raw message