spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <mayur.rust...@gmail.com>
Subject Re: Why RDD is not cached?
Date Wed, 29 Oct 2014 00:52:18 GMT
What is the partition count of the RDD, its possible that you dont have
enough memory to store the whole RDD on a single machine. Can you try
forcibly repartitioning the RDD & then cacheing.
Regards
Mayur

On Tue Oct 28 2014 at 1:19:09 AM shahab <shahab.mokari@gmail.com> wrote:

> I used Cache followed by a "count" on RDD to ensure that caching is
> performed.
>
> val rdd = srdd.flatMap(mapProfile_To_Sessions).cache
>
>    val count = rdd.count
>
> //so at this point RDD should be cahed ? right?
>
> On Tue, Oct 28, 2014 at 8:35 AM, Sean Owen <sowen@cloudera.com> wrote:
>
>> Did you just call cache()? By itself it does nothing but once an action
>> requires it to be computed it should become cached.
>> On Oct 28, 2014 8:19 AM, "shahab" <shahab.mokari@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a standalone spark , where the executor is set to have 6.3 G
>>> memory , as I am using two workers so in total there 12.6 G memory and 4
>>> cores.
>>>
>>> I am trying to cache a RDD with approximate size of 3.2 G, but
>>> apparently it is not cached as neither I can see  "
>>> BlockManagerMasterActor: Added rdd_XX in memory " nor  the performance
>>> of running the tasks is improved
>>>
>>> But, why it is not cached when there is enough memory storage?
>>> I tried with smaller RDDs. 1 or 2 G and it works, at least I could see "BlockManagerMasterActor:
>>> Added rdd_0_1 in memory" and improvement in results.
>>>
>>> Any idea what I am missing in my settings, or... ?
>>>
>>> thanks,
>>> /Shahab
>>>
>>
>

Mime
View raw message