spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Cheah <>
Subject Using Distcp when EC2 deployed with CDH4
Date Fri, 06 Dec 2013 20:42:27 GMT
Hi everyone,

I used to launch EC2 clusters with the spark scripts running Hadoop 1. I recently changed
it and launched a new cluster with the hadoop major version set to 2.

Spark-ec2 <args> --hadoop-major-version=2 <more-args>

In the old cluster, I would start persistent-hdfs and migrate data from S3 with distcp with:

Persistent-hdfs/bin/hadoop distcp <src> <dst>

However, when I do the same thing on the new cluster, I get an error:

/root/Persistent-hdfs/bin/hadoop distcp <src> <dst>

2013-12-06 20:38:44,808 INFO  mapreduce.Cluster ( - Failed to
use org.apache.hadoop.mapred.LocalClientProtocolProvider due to error: Invalid "mapreduce.jobtracker.address"
configuration value for LocalJobRunner : ""
2013-12-06 20:38:44,809 ERROR tools.DistCp ( - Exception encountered Cannot initialize Cluster. Please check your configuration for
and the correspond server addresses.

I'm wondering how the cluster has been configured differently when Hadoop 2 is specified for
the EC2 scripts, and why distcp isn't working here.  Thanks!

-Matt Cheah

View raw message