spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peng Cheng <pc...@uowmail.edu.au>
Subject Re: How to enable fault-tolerance?
Date Mon, 09 Jun 2014 18:28:14 GMT
Oh, and to make things worse, they forgot '\*' in their regex.
Am I the first to encounter this problem before?

On Mon 09 Jun 2014 02:24:43 PM EDT, Peng Cheng wrote:
> Thanks a lot! That's very responsive, somebody definitely has
> encountered the same problem before, and added two hidden modes in
> masterURL:
>
> (from SparkContext.scala: line1431)
>
>    // Regular expression for local[N, maxRetries], used in tests with
> failing tasks
>    val LOCAL_N_FAILURES_REGEX = """local\[([0-9]+)\s*,\s*([0-9]+)\]""".r
>    // Regular expression for simulating a Spark cluster of [N, cores,
> memory] locally
>    val LOCAL_CLUSTER_REGEX =
> """local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*]""".r
>
> Unfortunately they never got pushed into the documentation, and you
> got config parameters scattered in two different places (masterURL and
> $spark.task.maxFailures).
> I'm thinking of adding a new config parameter
> $spark.task.maxLocalFailures to override 1, how do you think?
>
> Thanks again buddy.
>
> Yours Peng
>
> On Mon 09 Jun 2014 01:33:45 PM EDT, Aaron Davidson wrote:
>> Looks like your problem is local mode:
>> https://github.com/apache/spark/blob/640f9a0efefd42cff86aecd4878a3a57f5ae85fa/core/src/main/scala/org/apache/spark/SparkContext.scala#L1430
>>
>>
>> For some reason, someone decided not to do retries when running in
>> local mode. Not exactly sure why, feel free to submit a JIRA on this.
>>
>>
>> On Mon, Jun 9, 2014 at 8:59 AM, Peng Cheng <pc175@uow.edu.au
>> <mailto:pc175@uow.edu.au>> wrote:
>>
>>     I speculate that Spark will only retry on exceptions that are
>>     registered with
>>     TaskSetScheduler, so a definitely-will-fail task will fail quickly
>>     without
>>     taking more resources. However I haven't found any documentation
>>     or web page
>>     on it
>>
>>
>>
>>     --
>>     View this message in context:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-fault-tolerance-tp7250p7255.html
>>
>>     Sent from the Apache Spark User List mailing list archive at
>>     Nabble.com.
>>
>>

Mime
View raw message