spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Srivastava <ankur.srivast...@gmail.com>
Subject Issue using S3 bucket from Spark 1.2.1 with hadoop 2.4
Date Tue, 03 Mar 2015 17:44:33 GMT
Hi,

We recently upgraded to Spark 1.2.1 - Hadoop 2.4 binary. We are not having
any other dependency on hadoop jars, except for reading our source files
from S3.

Since we have upgraded to the latest version our reads from S3 have
considerably slowed down. For some jobs we see the read from S3 is stalled
for a long time and then it starts.

Is there a known issue with S3 or do we need to upgrade any settings? The
only settings that we are using are:
sc.hadoopConfiguration().set("fs.s3n.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem");

sc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", someKey);

 sc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", someSecret);


Thanks for help!!

- Ankur

Mime
View raw message