hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kuhu Shukla <kshu...@yahoo-inc.com.INVALID>
Subject Re: [VOTE] Release Apache Hadoop 3.0.0-alpha2 RC0
Date Wed, 25 Jan 2017 16:07:03 GMT
+1 (non-binding)* Built from source* Deployed on a pseudo-distributed cluster (MAC)* Ran wordcount
and sleep jobs.

    On Wednesday, January 25, 2017 3:21 AM, Marton Elek <melek@hortonworks.com> wrote:


I also did a quick smoketest with the provided 3.0.0-alpha2 binaries:

TLDR; It works well

 * 5 hosts, docker based hadoop cluster, every component in separated container (5 datanode/5
 * Components are:
  * Hdfs/Yarn cluster (upgraded 2.7.3 to 3.0.0-alpha2 using the binary package for vote)
  * Zeppelin 0.6.2/0.7.0-RC2
  * Spark 2.0.2/2.1.0
  * HBase 1.2.4 + zookeeper
  * + additional docker containers for configuration management and monitoring
* No HA, no kerberos, no wire encryption

 * HDFS cluster upgraded successfully from 2.7.3 (with about 200G data)
 * Imported 100G data to HBase successfully
 * Started Spark jobs to process 1G json from HDFS (using spark-master/slave cluster). It
worked even when I used the Zeppelin 0.6.2 + Spark 2.0.2 (with old hadoop client included).
Obviously the old version can't use the new Yarn cluster as the token file format has been
 * I upgraded my setup to use Zeppelin 0.7.0-RC2/Spark 2.1.0(distribution without hadoop)/hadoop
3.0.0-alpha2. It also worked well: processed the same json files from HDFS with spark jobs
(from zeppelin) using yarn cluster (master: yarn deploy-mode: cluster)
 * Started spark jobs (with spark submit, master: yarn) to count records from the hbase database:
 * Started example Mapreduce jobs from distribution over yarn. It was OK but only with specific
configuration (see bellow)

So my overall impression that it works very well (at least with my 'smalldata')

Some notes (none of them are blocking):

1. To run the example mapreduce jobs I defined HADOOP_MAPRED_HOME at command line:
./bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha2.jar pi -Dyarn.app.mapreduce.am.env="HADOOP_MAPRED_HOME={{HADOOP_COMMON_HOME}}"
-Dmapreduce.admin.user.env="HADOOP_MAPRED_HOME={{HADOOP_COMMON_HOME}}" 10 10

And in the yarn-site:


I don't know the exact reason for the change, but the 2.7.3 was more userfriendly as the example
could be run without specific configuration.

For the same reason I didn't start hbase mapreduce job with hbase command line app (There
could be some option for hbase to define MAPRED_HOME_DIR as well, but by default I got ClassNotFoundException
for one of the MR class)

2. For the records: The logging and htrace classes are excluded from the shaded hadoop client
jar so I added it manually one by one to the spark (spark 2.1.0 distribution without hadoop):

RUN wget `cat url` -O spark.tar.gz && tar zxf spark.tar.gz && rm spark.tar.gz
&& mv spark* spark
RUN cp /opt/hadoop/share/hadoop/client/hadoop-client-api-3.0.0-alpha2.jar /opt/spark/jars
RUN cp /opt/hadoop/share/hadoop/client/hadoop-client-runtime-3.0.0-alpha2.jar /opt/spark/jars
ADD https://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar
ADD https://repo1.maven.org/maven2/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar
ADD https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar /opt/spark/jars/
ADD https://repo1.maven.org/maven2/log4j/log4j/1.2.17/log4j-1.2.17.jar /opt/spark/jars

With this jars files spark 2.1.0 works well with the alpha2 version of HDFS and YARN.

3. The messages "Upgrade in progress. Not yet finalized." wasn't disappeared from the namenode
webui but the cluster works well.

Most probably I missed to do something, but it's a little bit confusing.

(I checked the REST call, it is the jmx bean who reports that it was not yet finalized, the
code of the webpage seems to be ok.)


On Jan 25, 2017, at 8:38 AM, Yongjun Zhang <yjzhangal@apache.org<mailto:yjzhangal@apache.org>>

Thanks Andrew much for the work here!

+1 (binding).

- Downloaded both binary and src tarballs
- Verified md5 checksum and signature for both
- Built from source tarball
- Deployed 2 pseudo clusters, one with the released tarball and the other
 with what I built from source, and did the following on both:
    - Run basic HDFS operations, snapshots and distcp jobs
    - Run pi job
    - Examined HDFS webui, YARN webui.



On Tue, Jan 24, 2017 at 3:56 PM, Eric Badger <ebadger@yahoo-inc.com.invalid<mailto:ebadger@yahoo-inc.com.invalid>>

+1 (non-binding)
- Verified signatures and md5- Built from source- Started single-node
cluster on my mac- Ran some sleep jobs

  On Tuesday, January 24, 2017 4:32 PM, Yufei Gu <flyrain000@gmail.com<mailto:flyrain000@gmail.com>>

Hi Andrew,

Thanks for working on this.

+1 (Non-Binding)

1. Downloaded the binary and verified the md5.
2. Deployed it on 3 node cluster with 1 ResourceManager and 2 NodeManager.
3. Set YARN to use Fair Scheduler.
4. Ran MapReduce jobs Pi
5. Verified Hadoop version command output is correct.



On Tue, Jan 24, 2017 at 3:02 AM, Marton Elek <melek@hortonworks.com<mailto:melek@hortonworks.com>>

minicluster is kind of weird on filesystems that don't support mixed
case, like OS X's default HFS+.

$  jar tf hadoop-client-minicluster-3.0.0-alpha3-SNAPSHOT.jar | grep

I added a patch to https://issues.apache.org/jira/browse/HADOOP-14018 to
add the missing META-INF/LICENSE.txt to the shaded files.

Question: what should be done with the other LICENSE files in the
minicluster. Can we just exclude them (from legal point of view)?


To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org<mailto:yarn-dev-unsubscribe@hadoop.apache.org>
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org<mailto:yarn-dev-help@hadoop.apache.org>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message