spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shuai Zheng <szheng.c...@gmail.com>
Subject Any "Replicated" RDD in Spark?
Date Mon, 03 Nov 2014 21:03:17 GMT
Hi All,

I have spent last two years on hadoop but new to spark.
I am planning to move one of my existing system to spark to get some
enhanced features.

My question is:

If I try to do a map side join (something similar to "Replicated" key word
in Pig), how can I do it? Is it anyway to declare a RDD as "replicated"
(means distribute it to all nodes and each node will have a full copy)?

I know I can use accumulator to get this feature, but I am not sure what is
the best practice. And if I accumulator to broadcast the data set, can then
(after broadcast) convert it into a RDD and do the join?

Regards,

Shuai

Mime
View raw message