spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: distinct on huge dataset
Date Sun, 23 Mar 2014 16:56:22 GMT
Andrew, this should be fixed in 0.9.1, assuming it is the same hash
collision error we found there.

Kane, is it possible your bigger data is corrupt, such that that any
operations on it fail?


On Sat, Mar 22, 2014 at 10:39 PM, Andrew Ash <andrew@andrewash.com> wrote:

> FWIW I've seen correctness errors with spark.shuffle.spill on 0.9.0 and
> have it disabled now. The specific error behavior was that a join would
> consistently return one count of rows with spill enabled and another count
> with it disabled.
>
> Sent from my mobile phone
> On Mar 22, 2014 1:52 PM, "Kane" <kane.isturm@gmail.com> wrote:
>
>> But i was wrong - map also fails on big file and setting
>> spark.shuffle.spill
>> doesn't help. Map fails with the same error.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/distinct-on-huge-dataset-tp3025p3039.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>

Mime
View raw message