spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 尹绪森 <yinxu...@gmail.com>
Subject Re: Does foreach operation increase rdd lineage?
Date Fri, 24 Jan 2014 13:03:19 GMT
Do you mean "Gibbs sampling" ? Actually, foreach is an action, it will
collect all data from workers to driver. You will get OOM complained by JVM.

I am not very sure of your implementation, but if data not need to join
together, you'd better keep them in workers.


2014/1/24 guojc <guojc03@gmail.com>

> Hi,
>    I'm writing a paralell mcmc program that having a very large dataset in
> memory, and need to update the dataset in-memory and avoid creating
> additional copy. Should I choose a foreach operation on rdd to express the
> change? or I have to create a new rdd after each sampling process?
>
> Thanks,
> Jiacheng Guo
>



-- 
Best Regards
-----------------------------------
Xusen Yin    尹绪森
Beijing Key Laboratory of Intelligent Telecommunications Software and
Multimedia
Beijing University of Posts & Telecommunications
Intel Labs China
Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*

Mime
View raw message