hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <cdoug...@apache.org>
Subject Re: [VOTE] Release Apache Hadoop 2.9.0 (RC0)
Date Thu, 09 Nov 2017 17:51:01 GMT
The labor required for these release formalisms is exceeding their
value. Our minor releases have more bugs than our patch releases (we
hope), but every consumer should understand how software versioning
works. Every device I own has bugs on major OS updates. That doesn't
imply that every minor release is strictly less stable than a patch
release, and users need to be warned off it.

In contrast, we should warn users about features that compromise
invariants like security or durability, either by design or due to
their early stage of development. We can't reasonably expect them to
understand those tradeoffs, since they depend on internal details of
Hadoop.

On Wed, Nov 8, 2017 at 5:34 PM, Vinod Kumar Vavilapalli
<vinodkv@apache.org> wrote:
> When we tried option (b), we used to make .0 as a GA release, but downstream projects
like Tez, Hive, Spark would come back and find an incompatible change - and now we were forced
into a conundrum - is fixing this incompatible change itself an incompatibility?

Every project takes these case-by-case. Most of the time we'll
accommodate the old semantics- and we try to be explicit where we
promise compatibility- but this isn't a logic problem, it's a
practical one. If it's an easy fix to an obscure API, we probably
won't even hear about it.

> Long story short, I'd just add to your voting thread and release notes that 2.9.0 still
needs to be tested downstream and so users may want to wait for subsequent point releases.

It's uncomfortable to have four active release branches, with 3.1
coming in early 2018. We all benefit from the shared deployment
experiences that harden these releases, and fragmentation creates
incentives to compete for that attention. Rather than tacitly
scuffling over waning interest in the 2.x series, I'd endorse your
other thread encouraging consolidation around 3.x.

To that end, there is no policy or precedent that requires that new
minor releases be labeled as "alpha". If there is cause to believe
that 2.9.0 is not ready to release in the stable line, then we
shouldn't release it. -C

>> On Nov 8, 2017, at 12:43 AM, Subru Krishnan <subru@apache.org> wrote:
>>
>> We are canceling the RC due to the issue that Rohith/Sunil identified. The
>> issue was difficult to track down as it only happens when you use IP for ZK
>> (works fine with host names) and moreover if ZK and RM are co-located on
>> same machine. We are hopeful to get the fix in tomorrow and roll out RC1.
>>
>> Thanks to everyone for the extensive testing/validation. Hopefully cost to
>> replicate with RC1 is much lower.
>>
>> -Subru/Arun.
>>
>> On Tue, Nov 7, 2017 at 5:27 PM, Konstantinos Karanasos <kkaranasos@gmail.com
>>> wrote:
>>
>>> +1 from me too.
>>>
>>> Did the following:
>>> 1) set up a 9-node cluster;
>>> 2) ran some Gridmix jobs;
>>> 3) ran (2) after enabling opportunistic containers (used a mix of
>>> guaranteed and opportunistic containers for each job);
>>> 4) ran (3) but this time enabling distributed scheduling of opportunistic
>>> containers.
>>>
>>> All the above worked with no issues.
>>>
>>> Thanks for all the effort guys!
>>>
>>> Konstantinos
>>>
>>>
>>>
>>> Konstantinos
>>>
>>> On Tue, Nov 7, 2017 at 2:56 PM, Eric Badger <ebadger@oath.com.invalid>
>>> wrote:
>>>
>>>> +1 (non-binding) pending the issue that Sunil/Rohith pointed out
>>>>
>>>> - Verified all hashes and checksums
>>>> - Built from source on macOS 10.12.6, Java 1.8.0u65
>>>> - Deployed a pseudo cluster
>>>> - Ran some example jobs
>>>>
>>>> Thanks,
>>>>
>>>> Eric
>>>>
>>>> On Tue, Nov 7, 2017 at 4:03 PM, Wangda Tan <wheeleast@gmail.com> wrote:
>>>>
>>>>> Sunil / Rohith,
>>>>>
>>>>> Could you check if your configs are same as Jonathan posted configs?
>>>>> https://issues.apache.org/jira/browse/YARN-7453?
>>>> focusedCommentId=16242693&
>>>>> page=com.atlassian.jira.plugin.system.issuetabpanels:
>>>>> comment-tabpanel#comment-16242693
>>>>>
>>>>> And could you try if using Jonathan's configs can still reproduce the
>>>>> issue?
>>>>>
>>>>> Thanks,
>>>>> Wangda
>>>>>
>>>>>
>>>>> On Tue, Nov 7, 2017 at 1:52 PM, Arun Suresh <asuresh@apache.org>
>>> wrote:
>>>>>
>>>>>> Thanks for testing Rohith and Sunil
>>>>>>
>>>>>> Can you please confirm if it is not a config issue at your end ?
>>>>>> We (both Jonathan and myself) just tried testing this on a fresh
>>>> cluster
>>>>>> (both automatic and manual) and we are not able to reproduce this.
>>> I've
>>>>>> updated the YARN-7453 <https://issues.apache.org/
>>> jira/browse/YARN-7453
>>>>>
>>>>>> JIRA
>>>>>> with details of testing.
>>>>>>
>>>>>> Cheers
>>>>>> -Arun/Subru
>>>>>>
>>>>>> On Tue, Nov 7, 2017 at 3:17 AM, Rohith Sharma K S <
>>>>>> rohithsharmaks@apache.org
>>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Sunil for confirmation. Btw, I have raised YARN-7453
>>>>>>> <https://issues.apache.org/jira/browse/YARN-7453> JIRA
to track
>>> this
>>>>>>> issue.
>>>>>>>
>>>>>>> - Rohith Sharma K S
>>>>>>>
>>>>>>> On 7 November 2017 at 16:44, Sunil G <sunilg@apache.org>
wrote:
>>>>>>>
>>>>>>>> Hi Subru and Arun.
>>>>>>>>
>>>>>>>> Thanks for driving 2.9 release. Great work!
>>>>>>>>
>>>>>>>> I installed cluster built from source.
>>>>>>>> - Ran few MR jobs with application priority enabled. Runs
fine.
>>>>>>>> - Accessed new UI and it also seems fine.
>>>>>>>>
>>>>>>>> However I am also getting same issue as Rohith reported.
>>>>>>>> - Started an HA cluster
>>>>>>>> - Pushed RM to standby
>>>>>>>> - Pushed back RM to active then seeing an exception.
>>>>>>>>
>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could not
>>>> transition
>>>>> to
>>>>>>>> Active
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorServic
>>>>>>>>    e.becomeActive(ActiveStandbyElectorBasedElect
>>>> orService.java:146)
>>>>>>>>        at
>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>> eStandbyElector.java:894
>>>>>>>>    )
>>>>>>>>
>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>        at
>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>> KeeperException.java:113)
>>>>>>>>        at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>> ZooKeeper.java:
>>>>>>>> 949)
>>>>>>>>
>>>>>>>> Will check and post more details,
>>>>>>>>
>>>>>>>> - Sunil
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 7, 2017 at 12:47 PM Rohith Sharma K S <
>>>>>>>> rohithsharmaks@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Subru/Arun for the great work!
>>>>>>>>>
>>>>>>>>> Downloaded source and built from it. Deployed RM HA non-secured
>>>>>> cluster
>>>>>>>>> along with new YARN UI and ATSv2.
>>>>>>>>>
>>>>>>>>> I am facing basic RM HA switch issue after first time
successful
>>>>>> start.
>>>>>>>>> *Can
>>>>>>>>> anyone else is facing this issue?*
>>>>>>>>>
>>>>>>>>> When RM is switched from ACTIVE to STANDBY to ACTIVE,
RM never
>>>>> switch
>>>>>> to
>>>>>>>>> active successfully. Exception trace I see from the log
is
>>>>>>>>>
>>>>>>>>> 2017-11-07 12:35:56,540 WARN org.apache.hadoop.ha.
>>>>>> ActiveStandbyElector:
>>>>>>>>> Exception handling the winning of election
>>>>>>>>> org.apache.hadoop.ha.ServiceFailedException: RM could
not
>>>>> transition
>>>>>> to
>>>>>>>>> Active
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>> torBasedElectorService.java:146)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(Activ
>>>>>>>> eStandbyElector.java:894)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(Acti
>>>>>>>> veStandbyElector.java:473)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
>>>>>>>> ClientCnxn.java:599)
>>>>>>>>>    at org.apache.zookeeper.ClientCnxn$EventThread.run(
>>>> ClientCnxn.
>>>>>>>> java:498)
>>>>>>>>> Caused by: org.apache.hadoop.ha.ServiceFailedException:
Error
>>>> when
>>>>>>>>> transitioning to Active mode
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>> ransitionToActive(AdminService.java:325)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyE
>>>>>>>> lectorBasedElectorService.becomeActive(ActiveStandbyElec
>>>>>>>> torBasedElectorService.java:144)
>>>>>>>>>    ... 4 more
>>>>>>>>> Caused by: org.apache.hadoop.service.ServiceStateException:
>>>>>>>>> org.apache.zookeeper.KeeperException$NoAuthException:
>>>>>> KeeperErrorCode =
>>>>>>>>> NoAuth
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.service.ServiceStateException.convert(Serv
>>>>>>>> iceStateException.java:105)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>> ice.java:205)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r.startActiveServices(ResourceManager.java:1131)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$1.run(ResourceManager.java:1171)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$1.run(ResourceManager.java:1167)
>>>>>>>>>    at java.security.AccessController.doPrivileged(Native
>>> Method)
>>>>>>>>>    at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>>>>>>>> upInformation.java:1886)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r.transitionToActive(ResourceManager.java:1167)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.AdminService.t
>>>>>>>> ransitionToActive(AdminService.java:320)
>>>>>>>>>    ... 5 more
>>>>>>>>> Caused by: org.apache.zookeeper.KeeperException$
>>> NoAuthException:
>>>>>>>>> KeeperErrorCode = NoAuth
>>>>>>>>>    at
>>>>>>>>> org.apache.zookeeper.KeeperException.create(
>>>>> KeeperException.java:113)
>>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multiInternal(
>>>>>> ZooKeeper.java:949)
>>>>>>>>>    at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.doO
>>>>>>>> peration(CuratorTransactionImpl.java:159)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.acc
>>>>>>>> ess$200(CuratorTransactionImpl.java:44)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>> all(CuratorTransactionImpl.java:129)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl$2.c
>>>>>>>> all(CuratorTransactionImpl.java:125)
>>>>>>>>>    at org.apache.curator.RetryLoop.
>>> callWithRetry(RetryLoop.java:
>>>>> 107)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.curator.framework.imps.CuratorTransactionImpl.com
>>>>>>>> mit(CuratorTransactionImpl.java:122)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager$SafeTransact
>>>>>>>> ion.commit(ZKCuratorManager.java:403)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.util.curator.ZKCuratorManager.safeSetData(
>>>>>>>> ZKCuratorManager.java:372)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMS
>>>>>>>> tateStore.getAndIncrementEpoch(ZKRMStateStore.java:493)
>>>>>>>>>    at
>>>>>>>>>
>>>>>>>>> org.apache.hadoop.yarn.server.resourcemanager.ResourceManage
>>>>>>>> r$RMActiveServices.serviceStart(ResourceManager.java:754)
>>>>>>>>>    at
>>>>>>>>> org.apache.hadoop.service.AbstractService.start(AbstractServ
>>>>>>>> ice.java:194)
>>>>>>>>>    ... 13 more
>>>>>>>>>
>>>>>>>>> Thanks & Regards
>>>>>>>>> Rohith Sharma K S
>>>>>>>>>
>>>>>>>>> On 4 November 2017 at 04:20, Arun Suresh <asuresh@apache.org>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>>     Apache Hadoop 2.9.0 is the first stable release
of Hadoop
>>>> 2.9
>>>>>>>> line
>>>>>>>>> and
>>>>>>>>>> will be the latest stable/production release for
Apache
>>> Hadoop -
>>>>> it
>>>>>>>>>> includes 30 New Features with 500+ subtasks, 407
Improvements,
>>>> 787
>>>>>> Bug
>>>>>>>>>> fixes new fixed issues since 2.8.2 .
>>>>>>>>>>
>>>>>>>>>>      More information about the 2.9.0 release plan
can be
>>> found
>>>>>> here:
>>>>>>>>>> *https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>> Roadmap#Roadmap-Version2.9
>>>>>>>>>> <https://cwiki.apache.org/confluence/display/HADOOP/
>>>>>>>>>> Roadmap#Roadmap-Version2.9>*
>>>>>>>>>>
>>>>>>>>>>      New RC is available at:
>>>>>>>>>> http://home.apache.org/~asuresh/hadoop-2.9.0-RC0/
>>>>>>>>>>
>>>>>>>>>>      The RC tag in git is: release-2.9.0-RC0, and
the latest
>>>>> commit
>>>>>>>> id
>>>>>>>>> is:
>>>>>>>>>> 6697f0c18b12f1bdb99cbdf81394091f4fef1f0a
>>>>>>>>>>
>>>>>>>>>>      The maven artifacts are available via
>>>> repository.apache.org
>>>>>> at:
>>>>>>>>>> *
>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>> hadoop-1065/
>>>>>>>>>> <
>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>> hadoop-1065/
>>>>>>>>>>> *
>>>>>>>>>>
>>>>>>>>>>      Please try the release and vote; the vote will
run for
>>> the
>>>>>>>> usual 5
>>>>>>>>>> days, ending on 11/10/2017 4pm PST time.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Arun/Subru
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org


Mime
View raw message