spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject Poor HDFS Data Locality on Spark-EC2
Date Tue, 04 Aug 2015 22:43:07 GMT
Hi Spark users and developers,

I have been trying to use spark-ec2. After I launched the spark cluster
(1.4.1) with ephemeral hdfs (using hadoop 2.4.0), I tried to execute a job
where the data is stored in the ephemeral hdfs. It does not matter what I
tried to do, there is no data locality at all. For instance, filtering data
and calculating the count of the filter data will always have locality
level "any". I tweaked the configurations spark.locality.wait.* but it does
not seem to care. I'm guessing this is because the hostname cannot be
resolved properly. Does anyone experience this problem before?

Best Regards,

Jerry

Mime
View raw message