spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Always two tasks slower than others, and then job fails
Date Fri, 14 Aug 2015 07:36:46 GMT
Data skew ? May your partition key has some special value like null or
empty string

On Fri, Aug 14, 2015 at 11:01 AM, randylu <randylu26@gmail.com> wrote:

>   It is strange that there are always two tasks slower than others, and the
> corresponding partitions's data are larger, no matter how many partitions?
>
>
> Executor ID     Address                 Task Time       Shuffle Read Size /
> Records
> 1       slave129.vsvs.com:56691 16 s    1       99.5 MB / 18865432
> *10     slave317.vsvs.com:59281 0 ms    0       413.5 MB / 311001318*
> 100     slave290.vsvs.com:60241 19 s    1       110.8 MB / 27075926
> 101     slave323.vsvs.com:36246 14 s    1       126.1 MB / 25052808
>
>   Task time and records of Executor 10 seems strange, and the cpus on the
> node are all 100% busy.
>
>   Anyone meets the same problem,  Thanks in advance for any answer!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Always-two-tasks-slower-than-others-and-then-job-fails-tp24257.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Best Regards

Jeff Zhang

Mime
View raw message