hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin P. McCabe" <cmcc...@apache.org>
Subject Re: Erratic Jenkins behavior
Date Thu, 19 Feb 2015 07:14:25 GMT
Hmm.  I guess my thought would be that we would have a fixed number of
"slots" (i.e. executors on a single node with associated .m2
directories).  Then we wouldn't clear each .m2 in between runs, but we
would ensure that only one slot at a time had access to each

In that case, build times wouldn't increase that much (or really at
all, until a dependency changed... right?).  When a dependency changed
we'd have to do O(N_slots) amount of work, but dependencies don't
change that often.

Of course, the current situation also generates a lot of extra work
because people need to rekick builds that failed for mystery reasons.


On Wed, Feb 18, 2015 at 9:53 AM, Chris Nauroth <cnauroth@hortonworks.com> wrote:
> I¹m pretty sure there is no guarantee of isolation on a shared
> .m2/repository directory for multiple concurrent Maven processes.  I¹ve
> had a theory for a while that one build running ³mvm install² can
> overwrite the snapshot artifact that was just installed by another
> concurrent build.  This can create bizarre problems, for example if a
> patch introduces a new class in hadoop-common and then references that
> class from hadoop-hdfs.
> I expect using completely separate work directories for .m2/repository,
> the patch directory, and the Jenkins workspace could resolve this.  The
> typical cost for this kind of change is increased disk consumption and
> increased build time, since Maven would need to download dependencies
> fresh every time.
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
> On 2/12/15, 2:00 PM, "Colin P. McCabe" <cmccabe@apache.org> wrote:
>>We could potentially use different .m2 directories for each executor.
>>I think this has been brought up in the past as well.
>>I'm not sure how maven handles concurrent access to the .m2
>>directory... if it's not using flock or fnctl then it's not really
>>safe.  This might explain some of our missing class error issues.
>>On Tue, Feb 10, 2015 at 2:13 AM, Steve Loughran <stevel@hortonworks.com>
>>> Mvn is a dark mystery to us all. I wouldn't trust it not pick up things
>>>from other builds if they ended up published to ~/.m2/repository during
>>>the process
>>> On 9 February 2015 at 19:29:06, Colin P. McCabe
>>>(cmccabe@apache.org<mailto:cmccabe@apache.org>) wrote:
>>> I'm sorry, I don't have any insight into this. With regard to
>>> HADOOP-11084, I thought that $BUILD_URL would be unique for each
>>> concurrent build, which would prevent build artifacts from getting
>>> mixed up between jobs. Based on the value of PATCHPROCESS that Kihwal
>>> posted, perhaps this is not the case? Perhaps someone can explain how
>>> this is supposed to work (I am a Jenkins newbie).
>>> regards,
>>> Colin
>>> On Thu, Feb 5, 2015 at 10:42 AM, Yongjun Zhang <yzhang@cloudera.com>
>>>> Thanks Kihwal for bringing this up.
>>>> Seems related to:
>>>> https://issues.apache.org/jira/browse/HADOOP-11084
>>>> Hi Andrew/Arpit/Colin/Steve, you guys worked on this jira before, any
>>>> insight about the issue Kihwal described?
>>>> Thanks.
>>>> --Yongjun
>>>> On Thu, Feb 5, 2015 at 9:49 AM, Kihwal Lee
>>>> wrote:
>>>>> I am sure many of us have seen strange jenkins behavior out of the
>>>>> precommit builds.
>>>>> - build artifacts missing
>>>>> - serving build artifact belonging to another build. This also causes
>>>>> wrong precommit results to be posted on the bug.
>>>>> - etc.
>>>>> The latest one I saw is disappearance of the unit test stdout/stderr
>>>>> during a build. After a successful run of unit tests, the file
>>>>>vanished, so
>>>>> the script could not cat it. It looked like another build process had
>>>>> deleted it, while this build was in progress.
>>>>> It might have something to do with the fact that the patch-dir is set
>>>>> following:
>>>>> don't have access to the jenkins build configs or the build machines,
>>>>>so I
>>>>> can't debug it further, but I think we need to take care of it sooner
>>>>> later. Can any one help?
>>>>> Kihwal

View raw message