spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: S3NativeFileSystem inefficient implementation when calling sc.textFile
Date Sun, 30 Nov 2014 19:03:05 GMT
Note that it does not appear that s3a solves the original problems in this
thread, which are on the Spark side or due to the fact that metadata
listing in S3 is slow simply due to going over the network.

On Sun, Nov 30, 2014 at 10:07 AM, David Blewett <david@dawninglight.net>
wrote:

> You might be interested in the new s3a filesystem in Hadoop 2.6.0 [1].
>
> 1.
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/HADOOP-10400
> On Nov 26, 2014 12:24 PM, "Aaron Davidson" <ilikerps@gmail.com> wrote:
>
>> Spark has a known problem where it will do a pass of metadata on a large
>> number of small files serially, in order to find the partition information
>> prior to starting the job. This will probably not be repaired by switching
>> the FS impl.
>>
>> However, you can change the FS being used like so (prior to the first
>> usage):
>> sc.hadoopConfiguration.set("fs.s3n.impl",
>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>>
>> On Wed, Nov 26, 2014 at 1:47 AM, Tomer Benyamini <tomer.ben@gmail.com>
>> wrote:
>>
>>> Thanks Lalit; Setting the access + secret keys in the configuration
>>> works even when calling sc.textFile. Is there a way to select which hadoop
>>> s3 native filesystem implementation would be used at runtime using the
>>> hadoop configuration?
>>>
>>> Thanks,
>>> Tomer
>>>
>>> On Wed, Nov 26, 2014 at 11:08 AM, lalit1303 <lalit@sigmoidanalytics.com>
>>> wrote:
>>>
>>>>
>>>> you can try creating hadoop Configuration and set s3 configuration i.e.
>>>> access keys etc.
>>>> Now, for reading files from s3 use newAPIHadoopFile and pass the config
>>>> object here along with key, value classes.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----
>>>> Lalit Yadav
>>>> lalit@sigmoidanalytics.com
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/S3NativeFileSystem-inefficient-implementation-when-calling-sc-textFile-tp19841p19845.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>

Mime
View raw message