spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: refer to dictionary
Date Tue, 31 Mar 2015 12:49:54 GMT
You can use broadcast variable. 

See also this thread:
http://search-hadoop.com/m/JW1q5GX7U22/Spark+broadcast+variable&subj=How+Broadcast+variable+scale+



> On Mar 31, 2015, at 4:43 AM, Peng Xia <sparkpengxia@gmail.com> wrote:
> 
> Hi,
> 
> I have a RDD (rdd1)where each line is split into an array ["a", "b", "c], etc.
> And I also have a local dictionary p (dict1) stores key value pair {"a":1, "b": 2, c:3}
> I want to replace the keys in the rdd with the its corresponding value in the dict:
> rdd1.map(lambda line: [dict1[item] for item in line])
> 
> But this task is not distributed, I believe the reason is the dict1 is a local instance.
> Can any one provide suggestions on this to parallelize this?
> 
> 
> Thanks,
> Best,
> Peng
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message