spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Mogenet <adrien.moge...@contentsquare.com>
Subject Re: How does Spark set task indexes?
Date Wed, 25 May 2016 08:48:52 GMT
Yes I've noticed this one and its related cousin, but not sure this is the
same issue there; our job "properly" ends after 6 attempts.
We'll try with disabled speculative mode anyway!

On 25 May 2016 at 00:13, Ted Yu <yuzhihong@gmail.com> wrote:

> Have you taken a look at SPARK-14915 ?
>
> On Tue, May 24, 2016 at 1:00 PM, Adrien Mogenet <
> adrien.mogenet@contentsquare.com> wrote:
>
>> Hi,
>>
>> I'm wondering how Spark is setting the "index" of task?
>> I'm asking this question because we have a job that constantly fails at
>> task index = 421.
>>
>> When increasing number of partitions, this then fails at index=4421.
>> Increase it a little bit more, now it's 24421.
>>
>> Our job is as simple as "(1) read json -> (2) group-by sesion identifier
>> -> (3) write parquet files" and always fails somewhere at step (3) with a
>> CommitDeniedException. We've identified that some troubles are basically
>> due to uneven data repartition right after step (2), and now try to go
>> further in our understanding on how does Spark behaves.
>>
>> We're using Spark 1.5.2, scala 2.11, on top of hadoop 2.6.0
>>
>> --
>>
>> *Adrien Mogenet*
>> Head of Backend/Infrastructure
>> adrien.mogenet@contentsquare.com
>> http://www.contentsquare.com
>> 50, avenue Montaigne - 75008 Paris
>>
>
>


-- 

*Adrien Mogenet*
Head of Backend/Infrastructure
adrien.mogenet@contentsquare.com
http://www.contentsquare.com
50, avenue Montaigne - 75008 Paris

Mime
View raw message