spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "fightfate@163.com" <fightf...@163.com>
Subject Re: Re: rdd.cache() not working ?
Date Wed, 01 Apr 2015 08:09:43 GMT
Hi all

Thanks a lot for caspuring this. 

We are now using 1.3.0 release. We tested with both prebuilt version spark and source code
compiling version targeting our CDH component, 

and the cache result did not show as expected. However, if we create dataframe with the person
rdd and using sqlContext.cacheTable operation,

we can see the cache results. Not sure what's happening here. If anyone can reproduce this
issue, please let me know.

Thanks,
Sun



fightfate@163.com
 
From: Sean Owen
Date: 2015-04-01 15:54
To: Yuri Makhno
CC: fightfate@163.com; Taotao.Li; user
Subject: Re: Re: rdd.cache() not working ?
No, cache() changes the bookkeeping of the existing RDD. Although it
returns a reference, it works to just call "person.cache".
 
I can't reproduce this. When I try to cache an RDD and then count it,
it is persisted in memory and I see it in the web UI. Something else
must be different about what's being executed.
 
On Wed, Apr 1, 2015 at 8:26 AM, Yuri Makhno <ymakhno@gmail.com> wrote:
> cache() method returns new RDD so you have to use something like this:
>
>  val person =
> sc.textFile("hdfs://namenode_host:8020/user/person.txt").map(_.split(",")).map(p
> => Person(p(0).trim.toInt, p(1)))
>
>  val cached = person.cache
>
>    cached.count
>
> when you rerun count on cached you will see that cache works
>
> On Wed, Apr 1, 2015 at 9:35 AM, fightfate@163.com <fightfate@163.com> wrote:
>>
>> Hi
>> That is just the issue. After running person.cache we then run
>> person.count
>> however, there still not be any cache performance showed from web ui
>> storage.
>>
>> Thanks,
>> Sun.
>>
>> ________________________________
>> fightfate@163.com
>>
>>
>> From: Taotao.Li
>> Date: 2015-04-01 14:02
>> To: fightfate
>> CC: user
>> Subject: Re: rdd.cache() not working ?
>> rerun person.count and you will see the performance of cache.
>>
>> person.cache would not cache it right now. It'll actually cache this RDD
>> after one action[person.count here]
>>
>> ________________________________
>> 发件人: fightfate@163.com
>> 收件人: "user" <user@spark.apache.org>
>> 发送时间: 星期三, 2015年 4 月 01日 下午 1:21:25
>> 主题: rdd.cache() not working ?
>>
>> Hi, all
>>
>> Running the following code snippet through spark-shell, however cannot see
>> any cached storage partitions in web ui.
>>
>> Does this mean that cache now working ? Cause if we issue person.count
>> again that we cannot say any time consuming
>>
>> performance upgrading. Hope anyone can explain this for a little.
>>
>> Best,
>>
>> Sun.
>>
>>    case class Person(id: Int, col1: String)
>>
>>    val person =
>> sc.textFile("hdfs://namenode_host:8020/user/person.txt").map(_.split(",")).map(p
>> => Person(p(0).trim.toInt, p(1)))
>>
>>    person.cache
>>
>>    person.count
>>
>> ________________________________
>> fightfate@163.com
>>
>>
>>
>> --
>>
>>
>> ---------------------------------------------------------------------------
>>
>> Thanks & Best regards
>>
>> 李涛涛 Taotao · Li  |  Fixed Income@Datayes  |  Software Engineer
>>
>> 地址:上海市浦东新区陆家嘴西路99号万向大厦8楼, 200120
>> Address :Wanxiang Towen 8F, Lujiazui West Rd. No.99, Pudong New District,
>> Shanghai, 200120
>>
>> 电话|Phone:021-60216502      手机|Mobile: +86-18202171279
>>
>>
>
Mime
View raw message