hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: division by zero in getLocalPathForWrite()
Date Mon, 14 Jan 2013 12:34:15 GMT
It certainly looks possible -can you file a JIRA issue on the problem?

On 13 January 2013 16:39, Ted Yu <yuzhihong@gmail.com> wrote:

> I found this error again, see
>
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/345/testReport/org.apache.hadoop.hbase.mapreduce/TestImportExport/testSimpleCase/
>
> 2013-01-12 11:53:52,809 WARN  [AsyncDispatcher event handler]
> resourcemanager.RMAuditLogger(255): USER=jenkins
>  OPERATION=Application
> Finished - Failed       TARGET=RMAppManager     RESULT=FAILURE
>  DESCRIPTION=App
> failed with state: FAILED       PERMISSIONS=Application
> application_1357991604658_0002 failed 1 times due to AM Container for
> appattempt_1357991604658_0002_000001 exited with  exitCode: -1000 due
> to: java.lang.ArithmeticException: / by zero
>         at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:368)
>         at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
>         at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
>         at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
>         at
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:279)
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:851)
>
> .Failing this attempt.. Failing the
> application.    APPID=application_1357991604658_0002
> Here is related code:
>
>         // Keep rolling the wheel till we get a valid path
>         Random r = new java.util.Random();
>         while (numDirsSearched < numDirs && returnPath == null) {
>           long randomPosition = Math.abs(r.nextLong()) % totalAvailable;
>
> My guess is that totalAvailable was 0, meaning dirDF was empty.
>
> Please advise whether that scenario is possible.
>
> Cheers
>
> On Tue, Oct 30, 2012 at 9:33 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Thanks for the investigation Kihwal.
> >
> > I will keep an eye on future test failure in TestRowCounter.
> >
> >
> > On Tue, Oct 30, 2012 at 9:29 AM, Kihwal Lee <kihwal@yahoo-inc.com>
> wrote:
> >
> >> Ted,
> >>
> >> I couldn't reproduce it by just running the test case. When you
> reproduce
> >> it, look at the stderr/stdout file somewhere under
> >> target/org.apache.hadoop.mapred.MiniMRCluster. Look for the one under
> the
> >> directory whose name containing the app id.
> >>
> >> I did run into a similar problem and the stderr said:
> >> /bin/bash: /bin/java: No such file or directory
> >>
> >> It was because JAVA_HOME was not set. But in this case the exit code was
> >> 127 (shell not being able to locate the command to exec). In the hudson
> >> job, the exit code was 1, so I think it's something else.
> >>
> >> Kihwal
> >>
> >> On 10/29/12 11:56 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
> >>
> >> >TestRowCounter still fails:
> >> >
> >>
> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/244/testReport/j
> >>
> >>
> >unit/org.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterNoColu
> >> >mn/
> >> >
> >> >but there was no 'divide by zero' exception.
> >> >
> >> >Cheers
> >> >
> >> >On Thu, Oct 25, 2012 at 8:04 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >> >
> >> >> I will try 2.0.2-alpha release.
> >> >>
> >> >> Cheers
> >> >>
> >> >>
> >> >> On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu <yuzhihong@gmail.com>
wrote:
> >> >>
> >> >>> Thanks for the quick response, Robert.
> >> >>> Here is the hadoop version being used:
> >> >>>     <hadoop-two.version>2.0.1-alpha</hadoop-two.version>
> >> >>>
> >> >>> If there is newer release, I am willing to try that before filing
> >> JIRA.
> >> >>>
> >> >>>
> >> >>> On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans
> >> >>><evans@yahoo-inc.com>wrote:
> >> >>>
> >> >>>> It looks like you are running with an older version of 2.0,
even
> >> >>>>though
> >> >>>> it
> >> >>>> does not really make much of a difference in this case,  The
issue
> >> >>>>shows
> >> >>>> up when getLocalPathForWrite thinks there is no space on to
write
> to
> >> >>>>on
> >> >>>> any of the disks it has configured.  This could be because
you do
> not
> >> >>>> have
> >> >>>> any directories configured.  I really don't know for sure exactly
> >> >>>>what is
> >> >>>> happening.  It might be disk fail in place removing disks for
you
> >> >>>>because
> >> >>>> of other issues. Either way we should file a JIRA against Hadoop
to
> >> >>>>make
> >> >>>> it so we never get the / by zero error and provide a better
way to
> >> >>>>handle
> >> >>>> the possible causes.
> >> >>>>
> >> >>>> --Bobby Evans
> >> >>>>
> >> >>>> On 10/24/12 11:54 PM, "Ted Yu" <yuzhihong@gmail.com>
wrote:
> >> >>>>
> >> >>>> >Hi,
> >> >>>> >HBase has Jenkins build against hadoop 2.0
> >> >>>> >I was checking why TestRowCounter sometimes failed:
> >> >>>> >
> >> >>>>
> >> >>>>
> >> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testRepor
> >> >>>>t/o
> >> >>>>
> >> >>>>
> >>
> >>
> >>>>>rg.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiv
> >> >>>>>eCol
> >> >>>> >umn/
> >> >>>> >
> >> >>>> >I think the following could be the cause:
> >> >>>> >
> >> >>>> >2012-10-22 23:46:32,571 WARN  [AsyncDispatcher event handler]
> >> >>>> >resourcemanager.RMAuditLogger(255): USER=jenkins
> >> >>>> OPERATION=Application
> >> >>>> >Finished - Failed      TARGET=RMAppManager     RESULT=FAILURE
> >> >>>>  DESCRIPTION=App
> >> >>>> >failed with state: FAILED      PERMISSIONS=Application
> >> >>>> >application_1350949562159_0002 failed 1 times due to AM
Container
> >> for
> >> >>>> >appattempt_1350949562159_0002_000001 exited with  exitCode:
-1000
> >> due
> >> >>>> >to: java.lang.ArithmeticException: / by zero
> >> >>>> >       at
> >> >>>>
> >> >>>>
> >>
> >>
> >>>>>org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPat
> >> >>>>>hFor
> >> >>>> >Write(LocalDirAllocator.java:355)
> >> >>>> >       at
> >> >>>>
> >> >>>>
> >>
> >>
> >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl
> >> >>>>>loca
> >> >>>> >tor.java:150)
> >> >>>> >       at
> >> >>>>
> >> >>>>
> >>
> >>
> >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl
> >> >>>>>loca
> >> >>>> >tor.java:131)
> >> >>>> >       at
> >> >>>>
> >> >>>>
> >>
> >>
> >>>>>org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAl
> >> >>>>>loca
> >> >>>> >tor.java:115)
> >> >>>> >       at
> >> >>>>
> >> >>>>
> >>
> >>
> >>>>>org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getL
> >> >>>>>ocal
> >> >>>> >PathForWrite(LocalDirsHandlerService.java:257)
> >> >>>> >       at
> >> >>>>
> >> >>>>
> >>
> >>
> >>>>>org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.R
> >> >>>>>esou
> >> >>>>
> >> >>>>
> >>
> >>
> >>>>>rceLocalizationService$LocalizerRunner.run(ResourceLocalizationService
> >> >>>>>.jav
> >> >>>> >a:849)
> >> >>>> >
> >> >>>> >However, I don't seem to find where in getLocalPathForWrite()
> >> >>>>division
> >> >>>> by
> >> >>>> >zero could have arisen.
> >> >>>> >
> >> >>>> >Comment / hint is welcome.
> >> >>>> >
> >> >>>> >Thanks
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message