hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer ...@effectivemachines.com>
Subject Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
Date Tue, 24 Oct 2017 23:04:52 GMT

My plan is currently to:

*  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561 patch to test
it out. 
* if the tests work, work on getting YETUS-561 committed to yetus master
* switch jobs back to ASF yetus master either post-YETUS-561 or without it if it doesn’t
work
* go back to working on something else, regardless of the outcome


> On Oct 24, 2017, at 2:55 PM, Chris Douglas <cdouglas@apache.org> wrote:
> 
> Sean/Junping-
> 
> Ignoring the epistemology, it's a problem. Let's figure out what's
> causing memory to balloon and then we can work out the appropriate
> remedy.
> 
> Is this reproducible outside the CI environment? To Junping's point,
> would YETUS-561 provide more detailed information to aid debugging? -C
> 
> On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <jdu@hortonworks.com> wrote:
>> In general, the "solid evidence" of memory leak comes from analysis of heapdump,
jastack, gc log, etc. In many cases, we can locate/conclude which piece of code are leaking
memory from the analysis.
>> 
>> Unfortunately, I cannot find any conclusion from previous comments and it even cannot
tell which daemons/components of HDFS consumes unexpected high memory. Don't sounds like a
solid bug report to me.
>> 
>> 
>> 
>> Thanks,?
>> 
>> 
>> Junping
>> 
>> 
>> ________________________________
>> From: Sean Busbey <busbey@cloudera.com>
>> Sent: Tuesday, October 24, 2017 2:20 PM
>> To: Junping Du
>> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; mapreduce-dev@hadoop.apache.org; yarn-dev@hadoop.apache.org
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>> Just curious, Junping what would "solid evidence" look like? Is the supposition here
that the memory leak is within HDFS test code rather than library runtime code? How would
such a distinction be shown?
>> 
>> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du <jdu@hortonworks.com<mailto:jdu@hortonworks.com>>
wrote:
>> Allen,
>>     Do we have any solid evidence to show the HDFS unit tests going through the roof
are due to serious memory leak by HDFS? Normally, I don't expect memory leak are identified
in our UTs - mostly, it (test jvm gone) is just because of test or deployment issues.
>>     Unless there is concrete evidence, my concern on seriously memory leak for HDFS
on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) have deployed 2.8 on
large production environment for months. Non-serious memory leak (like forgetting to close
stream in non-critical path, etc.) and other non-critical bugs always happens here and there
that we have to live with.
>> 
>> Thanks,
>> 
>> Junping
>> 
>> ________________________________________
>> From: Allen Wittenauer <aw@effectivemachines.com<mailto:aw@effectivemachines.com>>
>> Sent: Tuesday, October 24, 2017 8:27 AM
>> To: Hadoop Common
>> Cc: Hdfs-dev; mapreduce-dev@hadoop.apache.org<mailto:mapreduce-dev@hadoop.apache.org>;
yarn-dev@hadoop.apache.org<mailto:yarn-dev@hadoop.apache.org>
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <aw@effectivemachines.com<mailto:aw@effectivemachines.com>>
wrote:
>>> 
>>> 
>>> 
>>> With no other information or access to go on, my current hunch is that one of
the HDFS unit tests is ballooning in memory size.  The easiest way to kill a Linux machine
is to eat all of the RAM, thanks to overcommit and that's what this "feels" like.
>>> 
>>> Someone should verify if 2.8.2 has the same issues before a release goes out
...
>> 
>> 
>>        FWIW, I ran 2.8.2 last night and it has the same problems.
>> 
>>        Also: the node didn't die!  Looking through the workspace (so the next run
will destroy them), two sets of logs stand out:
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>> 
>>                                                        and
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>> 
>>        It looks like my hunch is correct:  RAM in the HDFS unit tests are going through
the roof.  It's also interesting how MANY log files there are.  Is surefire not picking up
that jobs are dying?  Maybe not if memory is getting tight.
>> 
>>        Anyway, at the point, branch-2.8 and higher are probably fubar'd. Additionally,
I've filed YETUS-561 so that Yetus-controlled Docker containers can have their RAM limits
set in order to prevent more nodes going catatonic.
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org<mailto:yarn-dev-unsubscribe@hadoop.apache.org>
>> For additional commands, e-mail: yarn-dev-help@hadoop.apache.org<mailto:yarn-dev-help@hadoop.apache.org>
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org<mailto:common-dev-unsubscribe@hadoop.apache.org>
>> For additional commands, e-mail: common-dev-help@hadoop.apache.org<mailto:common-dev-help@hadoop.apache.org>
>> 
>> 
>> 
>> 
>> --
>> busbey
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message