spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject Re: Caching small Rdd's take really long time and Spark seems frozen
Date Fri, 24 Aug 2018 09:56:53 GMT
Another test I just did it's to execute with local[X] and this problem
doesn't happen.  Communication problems?

2018-08-23 22:43 GMT+02:00 Guillermo Ortiz <konstt2000@gmail.com>:

> it's a complex DAG before the point I cache the RDD, they are some joins,
> filter and maps before caching data, but most of the times it doesn't take
> almost time to do it. I could understand if it would take the same time all
> the times to process or cache the data. Besides it seems random and they
> are any weird data in the input.
>
> Another test I tried it's disabled caching, and I saw that all the
> microbatches last the same time, so it seems that it's relation with
> caching these RDD's.
>
> El jue., 23 ago. 2018 a las 15:29, Sonal Goyal (<sonalgoyal4@gmail.com>)
> escribió:
>
>> How are these small RDDs created? Could the blockage be in their compute
>> creation instead of their caching?
>>
>> Thanks,
>> Sonal
>> Nube Technologies <http://www.nubetech.co>
>>
>> <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>> On Thu, Aug 23, 2018 at 6:38 PM, Guillermo Ortiz <konstt2000@gmail.com>
>> wrote:
>>
>>> I use spark with caching with persist method. I have several RDDs what I
>>> cache but some of them are pretty small (about 300kbytes). Most of time it
>>> works well and usually lasts 1s the whole job, but sometimes it takes about
>>> 40s to store 300kbytes to cache.
>>>
>>> If I go to the SparkUI->Cache, I can see how the percentage is
>>> increasing until 83% (250kbytes) and then it stops for a while. If I check
>>> the event time in the Spark UI I can see that when this happen there is a
>>> node where tasks takes very long time. This node could be any from the
>>> cluster, it's not always the same.
>>>
>>> In the spark executor logs I can see it's that it takes about 40s in
>>> store 3.7kb when this problem occurs
>>>
>>>     INFO  2018-08-23 12:46:58 Logging.scala:54 -
>>> org.apache.spark.storage.BlockManager: Found block rdd_1705_23 locally
>>>     INFO  2018-08-23 12:47:38 Logging.scala:54 -
>>> org.apache.spark.storage.memory.MemoryStore: Block rdd_1692_7 stored as
>>> bytes in memory (estimated size 3.7 KB, free 1048.0 MB)
>>>     INFO  2018-08-23 12:47:38 Logging.scala:54 -
>>> org.apache.spark.storage.BlockManager: Found block rdd_1692_7 locally
>>>
>>> I have tried with MEMORY_ONLY, MEMORY_AND_SER and so on with the same
>>> results. I have checked the IO disk (although if I use memory_only I guess
>>> that it doesn't have sense) and I can't see any problem. This happens
>>> randomly, but it could be in the 25% of the jobs.
>>>
>>> Any idea about what it could be happening?
>>>
>>
>>

Mime
View raw message