spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karlson <ksonsp...@siberie.de>
Subject Re: Spark stages very slow to complete
Date Tue, 02 Jun 2015 07:18:49 GMT
Hi, the code is some hundreds lines of Python. I can try to compose a 
minimal example as soon as I find the time, though. Any ideas until 
then?

> Would you mind posting the code?
> On 2 Jun 2015 00:53, "Karlson" <ksonspark@siberie.de> wrote:
> 
>> Hi,
>> 
>> In all (pyspark) Spark jobs, that become somewhat more involved, I am
>> experiencing the issue that some stages take a very long time to 
>> complete
>> and sometimes don't at all. This clearly correlates with the size of 
>> my
>> input data. Looking at the stage details for one such stage, I am 
>> wondering
>> where Spark spends all this time. Take this table of the stages task
>> metrics for example:
>> 
>> Metric                          Min             25th
>> percentile      Median          75th percentile Max
>> Duration                        1.4 min         1.5 min         1.7 
>> min
>>      1.9 min         2.3 min
>> Scheduler Delay                 1 ms            3 ms            4 ms
>>       5 ms            23 ms
>> Task Deserialization Time       1 ms            2 ms            3 ms
>>       8 ms            22 ms
>> GC Time                         0 ms            0 ms            0 ms
>>       0 ms            0 ms
>> Result Serialization Time       0 ms            0 ms            0 ms
>>       0 ms            1 ms
>> Getting Result Time             0 ms            0 ms            0 ms
>>       0 ms            0 ms
>> Input Size / Records            23.9 KB / 1     24.0 KB / 1     24.1 
>> KB /
>> 1     24.1 KB / 1     24.3 KB / 1
>> 
>> Why is the overall duration almost 2min? Where is all this time spent,
>> when no progress of the stages is visible? The progress bar simply 
>> displays
>> 0 succeeded tasks for a very long time before sometimes slowly 
>> progressing.
>> 
>> Also, the name of the stage displayed above is `javaToPython at 
>> null:-1`,
>> which I find very uninformative. I don't even know which action 
>> exactly is
>> responsible for this stage. Does anyone experience similar issues or 
>> have
>> any advice for me?
>> 
>> Thanks!
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message