spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 尹绪森 <>
Subject Re: Does foreach operation increase rdd lineage?
Date Fri, 24 Jan 2014 13:03:19 GMT
Do you mean "Gibbs sampling" ? Actually, foreach is an action, it will
collect all data from workers to driver. You will get OOM complained by JVM.

I am not very sure of your implementation, but if data not need to join
together, you'd better keep them in workers.

2014/1/24 guojc <>

> Hi,
>    I'm writing a paralell mcmc program that having a very large dataset in
> memory, and need to update the dataset in-memory and avoid creating
> additional copy. Should I choose a foreach operation on rdd to express the
> change? or I have to create a new rdd after each sampling process?
> Thanks,
> Jiacheng Guo

Best Regards
Xusen Yin    尹绪森
Beijing Key Laboratory of Intelligent Telecommunications Software and
Beijing University of Posts & Telecommunications
Intel Labs China
Homepage: * <>*

View raw message