spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nirav Patel <npa...@xactlycorp.com>
Subject Re: Executor Lost error
Date Tue, 04 Oct 2016 15:35:11 GMT
Few pointer from in addition:

1) Executor can also get lost if they hung up on GC and can't respond to
driver for timeout ms. That should be in executor logs though.
2) --conf "spark.shuffle.memoryFraction=0.8" that's very high shuffle
fraction. You should check UI for Event Timeline and exec logs to see
whether its failing on shuffle read or during computing or shuffle write
etc.

We supply 80GB of RAM for some of our spark workload ( < 2B records ). We
use spark 1.5. You can try spark 2.0 with DataSets if not already.



On Tue, Oct 4, 2016 at 6:39 AM, Yong Zhang <java8964@hotmail.com> wrote:

> You should check your executor log to identify the reason. My guess is
> that the executor is dead due to OOM.
>
>
> If it is the reason, then you need to tune your executor memory setting,
> or more important, your partitions count, to make sure you have enough
> memory to handle correct size of partition data.
>
>
> Yong
>
>
> ------------------------------
> *From:* Punit Naik <naik.punit44@gmail.com>
> *Sent:* Monday, October 3, 2016 8:07 PM
> *To:* user
> *Subject:* Executor Lost error
>
> Hi All
>
> I am trying to run a program for a large dataset (~ 1TB). I have already
> tested the code for low size of data and it works fine. But what I noticed
> is that he job fails if the size of input is large. It was giving me errors
> regarding akkka actor disassociation which I fixed by increasing the
> timeouts.
> But now I am getting errors like "execuyor lost" and "executor lost
> failure" which I can't seem to figure out. These are my current set of
> configs:
>
> --conf "spark.network.timeout=30000"
> --conf "spark.core.connection.ack.wait.timeout=30000"
> --conf "spark.akka.timeout=30000"
> --conf "spark.akka.askTimeout=30000"
> --conf "spark.akka.frameSize=1000"
> --conf "spark.storage.blockManagerSlaveTimeoutMs=600000"
> --conf "spark.network.timeout=600"
> --conf "spark.shuffle.memoryFraction=0.8"
> --conf "spark.driver.maxResultSize=16g"
> --conf "spark.driver.cores=10"
> --conf "spark.driver.memory=10g"
>
> Can anyone tell me any more configs to circumvent this "executor lost" and
> "executor lost failure" error?
>
> --
> Thank You
>
> Regards
>
> Punit Naik
>

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Mime
View raw message