hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liyin Liang (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-3619) Change streaming code to use new mapreduce api.
Date Thu, 05 Jan 2012 12:25:39 GMT
Change streaming code to use new mapreduce api.

                 Key: MAPREDUCE-3619
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3619
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: contrib/streaming
    Affects Versions: 0.23.1
            Reporter: Liyin Liang

If we run a streaming job with following python script as mapper or reducer, the job will
throws NullPointerException.
import sys,os
class MyTask:
  def __init__(self, file=sys.stdin):
    self.file = file
    print >>sys.stderr, "reporter:counter:spam,disp_flag_record,0"
    print >>sys.stderr, "reporter:counter:spam,spam_record,0"
  def process(self):
    while True:
      line = self.file.readline()
      if not line:
      print line

if __name__ == "__main__":
  task = MyTask()

Here is the NPE related log:
2011-12-22 14:14:06,310 WARN org.apache.hadoop.streaming.PipeMapRed: java.lang.NullPointerException
	at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.incrCounter(PipeMapRed.java:502)
	at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:444)

This is because the above script's "print >>sys.stderr" will invoke reporter.incrCounter()
during PipeMapper|PipeReducer.configure(). While we can not get reporter in configure() function.

To fix this problem, we should change streaming code to use new-api. Then we can call context.getCounter()
in Mapper|Reducer.setup() function.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message