spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sampo Niskanen <sampo.niska...@wellmo.com>
Subject Re: Caching causes later actions to get stuck
Date Mon, 02 Nov 2015 06:33:08 GMT
Hi,

Any ideas what's going wrong or how to fix it?  Do I have to downgrade to
0.9.x to be able to use Spark?


Best regards,

*    Sampo Niskanen*

*Lead developer / Wellmo*
    sampo.niskanen@wellmo.com
    +358 40 820 5291


On Fri, Oct 30, 2015 at 4:57 PM, Sampo Niskanen <sampo.niskanen@wellmo.com>
wrote:

> Hi,
>
> I'm facing a problem where Spark is able to perform an action on a cached
> RDD correctly the first time it is run, but running it immediately
> afterwards (or an action depending on that RDD) causes it to get stuck.
>
> I'm using a MongoDB connector for fetching all documents from a collection
> to an RDD and caching that (though according to the error message it
> doesn't fully fit).  The first action on it always succeeds, but latter
> actions fail.  I just upgraded from Spark 0.9.x to 1.5.1, and didn't have
> that problem with the older version.
>
>
> The output I get:
>
>
> scala> analyticsRDD.cache
> res10: analyticsRDD.type = MapPartitionsRDD[84] at map at Mongo.scala:69
>
> scala> analyticsRDD.count
> [Stage 2:=================================================>     (472 + 8)
> / 524]15/10/30 14:20:00 WARN MemoryStore: Not enough space to cache
> rdd_84_469 in memory! (computed 13.0 MB so far)
> 15/10/30 14:20:00 WARN MemoryStore: Not enough space to cache rdd_84_470
> in memory! (computed 12.1 MB so far)
> 15/10/30 14:20:00 WARN MemoryStore: Not enough space to cache rdd_84_476
> in memory! (computed 5.6 MB so far)
> ...
> 15/10/30 14:20:06 WARN MemoryStore: Not enough space to cache rdd_84_522
> in memory! (computed 5.3 MB so far)
> [Stage 2:======================================================>(522 + 2)
> / 524]15/10/30 14:20:06 WARN MemoryStore: Not enough space to cache
> rdd_84_521 in memory! (computed 13.9 MB so far)
> res11: Long = 7754957
>
>
> scala> analyticsRDD.count
> [Stage 3:=================================================>     (474 + 0)
> / 524]
>
>
> *** Restart Spark ***
>
> scala> analyticsRDD.count
> res10: Long = 7755043
>
>
> scala> analyticsRDD.count
> res11: Long = 7755050
>
>
>
> The cached RDD always gets stuck at the same point.  I tried enabling full
> debug logging, but couldn't make out anything useful.
>
>
> I'm also facing another issue with loading a lot of data from MongoDB,
> which might be related, but the error is different:
> https://groups.google.com/forum/#!topic/mongodb-user/Knj406szd74
>
>
> Any ideas?
>
>
> *    Sampo Niskanen*
>
> *Lead developer / Wellmo*
>     sampo.niskanen@wellmo.com
>     +358 40 820 5291
>
>

Mime
View raw message