spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Davidson <>
Subject Re: how to copy local files to hdfs quickly?
Date Sat, 30 Jul 2016 17:26:12 GMT
For lack of a better solution I am using ŒAWS s3 copy¹ to copy my files
locally and Œhadoop fs ­put ./tmp/* Œ to transfer them. In general put works
much better with a smaller number of big files compared to a large number of
small files

Your milage may vary


From:  Andrew Davidson <>
Date:  Wednesday, July 27, 2016 at 4:25 PM
To:  "user @spark" <>
Subject:  how to copy local files to hdfs quickly?

> I have a spark streaming app that saves JSON files to s3:// . It works fine
> Now I need to calculate some basic summary stats and am running into horrible
> performance problems.
> I want to run a test to see if reading from hdfs instead of s3 makes
> difference. I am able to quickly copy my the data from s3 to a machine in my
> cluster how ever hadoop fs ­put is pain fully slow. Is there a better way to
> copy large data to hdfs?
> I should mention I am not using EMR . I.E. According to AWS support there is
> no way to have Œ$aws s3¹ copy directory to hdfs://
> Hadoop distcp can not copy files from the local files system
> Thanks in advance
> Andy

View raw message