spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Srivastava <ankur.srivast...@gmail.com>
Subject Re: Issue using S3 bucket from Spark 1.2.1 with hadoop 2.4
Date Tue, 03 Mar 2015 18:06:37 GMT
Thanks a lot Ted!!

On Tue, Mar 3, 2015 at 9:53 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> If you can use hadoop 2.6.0 binary, you can use s3a
>
> s3a is being polished in the upcoming 2.7.0 release:
> https://issues.apache.org/jira/browse/HADOOP-11571
>
> Cheers
>
> On Tue, Mar 3, 2015 at 9:44 AM, Ankur Srivastava <
> ankur.srivastava@gmail.com> wrote:
>
>> Hi,
>>
>> We recently upgraded to Spark 1.2.1 - Hadoop 2.4 binary. We are not
>> having any other dependency on hadoop jars, except for reading our source
>> files from S3.
>>
>> Since we have upgraded to the latest version our reads from S3 have
>> considerably slowed down. For some jobs we see the read from S3 is stalled
>> for a long time and then it starts.
>>
>> Is there a known issue with S3 or do we need to upgrade any settings? The
>> only settings that we are using are:
>> sc.hadoopConfiguration().set("fs.s3n.impl",
>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
>>
>> sc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", someKey);
>>
>>  sc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", someSecret);
>>
>>
>> Thanks for help!!
>>
>> - Ankur
>>
>
>

Mime
View raw message