spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Davidson <>
Subject how to copy local files to hdfs quickly?
Date Wed, 27 Jul 2016 23:25:12 GMT
I have a spark streaming app that saves JSON files to s3:// . It works fine

Now I need to calculate some basic summary stats and am running into
horrible performance problems.

I want to run a test to see if reading from hdfs instead of s3 makes
difference. I am able to quickly copy my the data from s3 to a machine in my
cluster how ever hadoop fs ­put is pain fully slow. Is there a better way to
copy large data to hdfs?

I should mention I am not using EMR . I.E. According to AWS support there is
no way to have Œ$aws s3¹ copy directory to hdfs://

Hadoop distcp can not copy files from the local files system

Thanks in advance


View raw message