spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Cheah <mch...@palantir.com>
Subject Using Distcp when EC2 deployed with CDH4
Date Fri, 06 Dec 2013 20:42:27 GMT
Hi everyone,

I used to launch EC2 clusters with the spark scripts running Hadoop 1. I recently changed
it and launched a new cluster with the hadoop major version set to 2.

Spark-ec2 <args> --hadoop-major-version=2 <more-args>

In the old cluster, I would start persistent-hdfs and migrate data from S3 with distcp with:

Persistent-hdfs/bin/hadoop distcp <src> <dst>

However, when I do the same thing on the new cluster, I get an error:

/root/Persistent-hdfs/sbin/start-all.sh
/root/Persistent-hdfs/bin/hadoop distcp <src> <dst>

2013-12-06 20:38:44,808 INFO  mapreduce.Cluster (Cluster.java:initialize(114)) - Failed to
use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address"
configuration value for LocalJobRunner : "ec2-54-193-48-31.us-west-1.compute.amazonaws.com:9001"
2013-12-06 20:38:44,809 ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name
and the correspond server addresses.

I'm wondering how the cluster has been configured differently when Hadoop 2 is specified for
the EC2 scripts, and why distcp isn't working here.  Thanks!

-Matt Cheah

Mime
View raw message