spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject Re: Accessing S3 files with s3n://
Date Sun, 09 Aug 2015 13:01:07 GMT
Hi Akshat,

Is there a particular reason you don't use s3a? From my experience,s3a performs much better
than the rest. I believe the inefficiency is from the implementation of the s3 interface.

Best Regards,

Jerry

Sent from my iPhone

> On 9 Aug, 2015, at 5:48 am, Akhil Das <akhil@sigmoidanalytics.com> wrote:
> 
> Depends on which operation you are doing, If you are doing a .count() on a parquet, it
might not download the entire file i think, but if you do a .count() on a normal text file
it might pull the entire file.
> 
> Thanks
> Best Regards
> 
>> On Sat, Aug 8, 2015 at 3:12 AM, Akshat Aranya <aaranya@gmail.com> wrote:
>> Hi,
>> 
>> I've been trying to track down some problems with Spark reads being very slow with
s3n:// URIs (NativeS3FileSystem).  After some digging around, I realized that this file system
implementation fetches the entire file, which isn't really a Spark problem, but it really
slows down things when trying to just read headers from a Parquet file or just creating partitions
in the RDD.  Is this something that others have observed before, or am I doing something wrong?
>> 
>> Thanks,
>> Akshat
> 

Mime
View raw message