spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <nicholas.cham...@gmail.com>
Subject Best practices: Parallelized write to / read from S3
Date Mon, 31 Mar 2014 15:49:05 GMT
Howdy-doody,

I have a single, very large file sitting in S3 that I want to read in with
sc.textFile(). What are the best practices for reading in this file as
quickly as possible? How do I parallelize the read as much as possible?

Similarly, say I have a single, very large RDD sitting in memory that I
want to write out to S3 with RDD.saveAsTextFile(). What are the best
practices for writing this file out as quickly as possible?

Nick




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Mime
View raw message