spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grega Kešpret <gr...@celtra.com>
Subject Re: spark.task.maxFailures
Date Mon, 09 Dec 2013 15:35:01 GMT
Hi!

I tried this (by setting spark.task.maxFailures to 1) and it still does not
fail-fast. I started a job and after some time, I killed all JVMs running
on one of the two workers. I was expecting Spark job to fail, however it
re-fetched tasks to one of the two workers that was still alive and the job
succeeded.

Grega
--
[image: Inline image 1]
*Grega Kešpret*
Analytics engineer

Celtra — Rich Media Mobile Advertising
celtra.com <http://www.celtra.com/> |
@celtramobile<http://www.twitter.com/celtramobile>


On Mon, Dec 9, 2013 at 10:43 AM, Grega Kešpret <grega@celtra.com> wrote:

> Hi Reynold,
>
> I submitted a pull request here -
> https://github.com/apache/incubator-spark/pull/245
> Do I need to do anything else (perhaps add a ticket in JIRA)?
>
> Best,
> Grega
> --
> [image: Inline image 1]
> *Grega Kešpret*
>
> Analytics engineer
>
> Celtra — Rich Media Mobile Advertising
> celtra.com <http://www.celtra.com/> | @celtramobile<http://www.twitter.com/celtramobile>
>
>
> On Fri, Nov 29, 2013 at 6:24 PM, Reynold Xin <rxin@apache.org> wrote:
>
>> Looks like a bug to me. Can you submit a pull request?
>>
>>
>>
>> On Fri, Nov 29, 2013 at 2:02 AM, Grega Kešpret <grega@celtra.com> wrote:
>>
>> > Looking at
>> > http://spark.incubator.apache.org/docs/latest/configuration.html
>> > docs says:
>> > Number of individual task failures before giving up on the job. Should
>> be
>> > greater than or equal to 1. Number of allowed retries = this value - 1.
>> >
>> > However, looking at the code
>> >
>> >
>> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala#L532
>> >
>> > if I set spark.task.maxFailures to 1, this means that job will fail
>> after
>> > task fails for the second time. Shouldn't this line be corrected to if (
>> > numFailures(index) >= MAX_TASK_FAILURES) {
>> > ?
>> >
>> > I can open a pull request if this is the case.
>> >
>> > Thanks,
>> > Grega
>> > --
>> > [image: Inline image 1]
>> > *Grega Kešpret*
>> > Analytics engineer
>> >
>> > Celtra — Rich Media Mobile Advertising
>> > celtra.com <http://www.celtra.com/> | @celtramobile<
>> http://www.twitter.com/celtramobile>
>> >
>>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message