hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: division by zero in getLocalPathForWrite()
Date Mon, 14 Jan 2013 15:01:13 GMT
MAPREDUCE-4940 has been logged.

Thanks

On Mon, Jan 14, 2013 at 4:34 AM, Steve Loughran <stevel@hortonworks.com>wrote:

> It certainly looks possible -can you file a JIRA issue on the problem?
>
> On 13 January 2013 16:39, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > I found this error again, see
> >
> >
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/345/testReport/org.apache.hadoop.hbase.mapreduce/TestImportExport/testSimpleCase/
> >
> > 2013-01-12 11:53:52,809 WARN  [AsyncDispatcher event handler]
> > resourcemanager.RMAuditLogger(255): USER=jenkins
> >  OPERATION=Application
> > Finished - Failed       TARGET=RMAppManager     RESULT=FAILURE
> >  DESCRIPTION=App
> > failed with state: FAILED       PERMISSIONS=Application
> > application_1357991604658_0002 failed 1 times due to AM Container for
> > appattempt_1357991604658_0002_000001 exited with  exitCode: -1000 due
> > to: java.lang.ArithmeticException: / by zero
> >         at
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:368)
> >         at
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
> >         at
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
> >         at
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
> >         at
> >
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:279)
> >         at
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:851)
> >
> > .Failing this attempt.. Failing the
> > application.    APPID=application_1357991604658_0002
> > Here is related code:
> >
> >         // Keep rolling the wheel till we get a valid path
> >         Random r = new java.util.Random();
> >         while (numDirsSearched < numDirs && returnPath == null) {
> >           long randomPosition = Math.abs(r.nextLong()) % totalAvailable;
> >
> > My guess is that totalAvailable was 0, meaning dirDF was empty.
> >
> > Please advise whether that scenario is possible.
> >
> > Cheers
> >
> > On Tue, Oct 30, 2012 at 9:33 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Thanks for the investigation Kihwal.
> > >
> > > I will keep an eye on future test failure in TestRowCounter.
> > >
> > >
> > > On Tue, Oct 30, 2012 at 9:29 AM, Kihwal Lee <kihwal@yahoo-inc.com>
> > wrote:
> > >
> > >> Ted,
> > >>
> > >> I couldn't reproduce it by just running the test case. When you
> > reproduce
> > >> it, look at the stderr/stdout file somewhere under
> > >> target/org.apache.hadoop.mapred.MiniMRCluster. Look for the one under
> > the
> > >> directory whose name containing the app id.
> > >>
> > >> I did run into a similar problem and the stderr said:
> > >> /bin/bash: /bin/java: No such file or directory
> > >>
> > >> It was because JAVA_HOME was not set. But in this case the exit code
> was
> > >> 127 (shell not being able to locate the command to exec). In the
> hudson
> > >> job, the exit code was 1, so I think it's something else.
> > >>
> > >> Kihwal
> > >>
> > >> On 10/29/12 11:56 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
> > >>
> > >> >TestRowCounter still fails:
> > >> >
> > >>
> >
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/j
> > >>
> > >>
> >
> >unit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColu
> > >> >mn/
> > >> >
> > >> >but there was no 'divide by zero' exception.
> > >> >
> > >> >Cheers
> > >> >
> > >> >On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <yuzhihong@gmail.com>
wrote:
> > >> >
> > >> >> I will try 2.0.2-alpha release.
> > >> >>
> > >> >> Cheers
> > >> >>
> > >> >>
> > >> >> On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > >> >>
> > >> >>> Thanks for the quick response, Robert.
> > >> >>> Here is the hadoop version being used:
> > >> >>>     <hadoop-two.version>2.0.1-alpha</hadoop-two.version>
> > >> >>>
> > >> >>> If there is newer release, I am willing to try that before
filing
> > >> JIRA.
> > >> >>>
> > >> >>>
> > >> >>> On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans
> > >> >>><evans@yahoo-inc.com>wrote:
> > >> >>>
> > >> >>>> It looks like you are running with an older version of
2.0, even
> > >> >>>>though
> > >> >>>> it
> > >> >>>> does not really make much of a difference in this case,
 The
> issue
> > >> >>>>shows
> > >> >>>> up when getLocalPathForWrite thinks there is no space
on to write
> > to
> > >> >>>>on
> > >> >>>> any of the disks it has configured.  This could be because
you do
> > not
> > >> >>>> have
> > >> >>>> any directories configured.  I really don't know for sure
exactly
> > >> >>>>what is
> > >> >>>> happening.  It might be disk fail in place removing disks
for you
> > >> >>>>because
> > >> >>>> of other issues. Either way we should file a JIRA against
Hadoop
> to
> > >> >>>>make
> > >> >>>> it so we never get the / by zero error and provide a better
way
> to
> > >> >>>>handle
> > >> >>>> the possible causes.
> > >> >>>>
> > >> >>>> --Bobby Evans
> > >> >>>>
> > >> >>>> On 10/24/12 11:54 PM, "Ted Yu" <yuzhihong@gmail.com>
wrote:
> > >> >>>>
> > >> >>>> >Hi,
> > >> >>>> >HBase has Jenkins build against hadoop 2.0
> > >> >>>> >I was checking why TestRowCounter sometimes failed:
> > >> >>>> >
> > >> >>>>
> > >> >>>>
> > >>
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testRepor
> > >> >>>>t/o
> > >> >>>>
> > >> >>>>
> > >>
> > >>
> >
> >>>>>rg.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiv
> > >> >>>>>eCol
> > >> >>>> >umn/
> > >> >>>> >
> > >> >>>> >I think the following could be the cause:
> > >> >>>> >
> > >> >>>> >2012-10-22 23:46:32,571 WARN  [AsyncDispatcher event
handler]
> > >> >>>> >resourcemanager.RMAuditLogger(255): USER=jenkins
> > >> >>>> OPERATION=Application
> > >> >>>> >Finished - Failed      TARGET=RMAppManager     RESULT=FAILURE
> > >> >>>>  DESCRIPTION=App
> > >> >>>> >failed with state: FAILED      PERMISSIONS=Application
> > >> >>>> >application_1350949562159_0002 failed 1 times due
to AM
> Container
> > >> for
> > >> >>>> >appattempt_1350949562159_0002_000001 exited with 
exitCode:
> -1000
> > >> due
> > >> >>>> >to: java.lang.ArithmeticException: / by zero
> > >> >>>> >       at
> > >> >>>>
> > >> >>>>
> > >>
> > >>
> >
> >>>>>org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPat
> > >> >>>>>hFor
> > >> >>>> >Write(LocalDirAllocator.java:355)
> > >> >>>> >       at
> > >> >>>>
> > >> >>>>
> > >>
> > >>
> >
> >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl
> > >> >>>>>loca
> > >> >>>> >tor.java:150)
> > >> >>>> >       at
> > >> >>>>
> > >> >>>>
> > >>
> > >>
> >
> >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl
> > >> >>>>>loca
> > >> >>>> >tor.java:131)
> > >> >>>> >       at
> > >> >>>>
> > >> >>>>
> > >>
> > >>
> >
> >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl
> > >> >>>>>loca
> > >> >>>> >tor.java:115)
> > >> >>>> >       at
> > >> >>>>
> > >> >>>>
> > >>
> > >>
> >
> >>>>>org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getL
> > >> >>>>>ocal
> > >> >>>> >PathForWrite(LocalDirsHandlerService.java:257)
> > >> >>>> >       at
> > >> >>>>
> > >> >>>>
> > >>
> > >>
> >
> >>>>>org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.R
> > >> >>>>>esou
> > >> >>>>
> > >> >>>>
> > >>
> > >>
> >
> >>>>>rceLocalizationService$LocalizerRunner.run(ResourceLocalizationService
> > >> >>>>>.jav
> > >> >>>> >a:849)
> > >> >>>> >
> > >> >>>> >However, I don't seem to find where in getLocalPathForWrite()
> > >> >>>>division
> > >> >>>> by
> > >> >>>> >zero could have arisen.
> > >> >>>> >
> > >> >>>> >Comment / hint is welcome.
> > >> >>>> >
> > >> >>>> >Thanks
> > >> >>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message