spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fregly <ch...@fregly.com>
Subject Re: Readin from Amazon S3 behaves inconsistently: return different number of lines...
Date Sun, 31 Aug 2014 05:56:14 GMT
interesting and possibly-related blog post from netflix earlier this year:
http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html


On Fri, Aug 1, 2014 at 8:09 AM, nit <nitinpanj@gmail.com> wrote:

> @sean - I am using latest code from master branch, up to commit#
> a7d145e98c55fa66a541293930f25d9cdc25f3b4 .
>
> In my case I have multiple directories with 1024 files(in that sizes of
> files may be different). For some directories I always get consistent
> result... and for others I can reproduce the inconsistent behavior.
>
> I am not much familiar with S3 protocol or s3 driver in spark. I am
> wondering, how does s3 driver verifies that all files(and their content)
> under a directory were correctly?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Reading-from-Amazon-S3-directory-via-textFile-api-behaves-inconsistently-tp11092p11170.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message