spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Dong <dongda...@gmail.com>
Subject Re: How to share a Map among RDDS?
Date Wed, 22 Jul 2015 19:33:49 GMT
Thanks Andrew, exactly.

2015-07-22 14:26 GMT-05:00 Andrew Or <andrew@databricks.com>:

> Hi Dan,
>
> `map2` is a broadcast variable, not your map. To access the map on the
> executors you need to do `map2.value(a)`.
>
> -Andrew
>
> 2015-07-22 12:20 GMT-07:00 Dan Dong <dongdan39@gmail.com>:
>
>> Hi, Andrew,
>>   If I broadcast the Map:
>> val map2=sc.broadcast(map1)
>>
>> I will get compilation error:
>> org.apache.spark.broadcast.Broadcast[scala.collection.immutable.Map[Int,String]]
>> does not take parameters
>> [error]      val matchs= Vecs.map(term=>term.map{case (a,b)=>(map2(a),b)})
>>
>> Seems it's still an RDD, so how to access it by value=map2(key) ? Thanks!
>>
>> Cheers,
>> Dan
>>
>>
>>
>> 2015-07-22 2:20 GMT-05:00 Andrew Or <andrew@databricks.com>:
>>
>>> Hi Dan,
>>>
>>> If the map is small enough, you can just broadcast it, can't you? It
>>> doesn't have to be an RDD. Here's an example of broadcasting an array and
>>> using it on the executors:
>>> https://github.com/apache/spark/blob/c03299a18b4e076cabb4b7833a1e7632c5c0dabe/examples/src/main/scala/org/apache/spark/examples/BroadcastTest.scala
>>> .
>>>
>>> -Andrew
>>>
>>> 2015-07-21 19:56 GMT-07:00 ayan guha <guha.ayan@gmail.com>:
>>>
>>>> Either you have to do rdd.collect and then broadcast or you can do a
>>>> join
>>>> On 22 Jul 2015 07:54, "Dan Dong" <dongdan39@gmail.com> wrote:
>>>>
>>>>> Hi, All,
>>>>>
>>>>>
>>>>> I am trying to access a Map from RDDs that are on different compute
>>>>> nodes, but without success. The Map is like:
>>>>>
>>>>> val map1 = Map("aa"->1,"bb"->2,"cc"->3,...)
>>>>>
>>>>> All RDDs will have to check against it to see if the key is in the Map
>>>>> or not, so seems I have to make the Map itself global, the problem is
that
>>>>> if the Map is stored as RDDs and spread across the different nodes, each
>>>>> node will only see a piece of the Map and the info will not be complete
to
>>>>> check against the Map( an then replace the key with the corresponding
>>>>> value) E,g:
>>>>>
>>>>> val matchs= Vecs.map(term=>term.map{case (a,b)=>(map1(a),b)})
>>>>>
>>>>> But if the Map is not an RDD, how to share it like sc.broadcast(map1)
>>>>>
>>>>> Any idea about this? Thanks!
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Dan
>>>>>
>>>>>
>>>
>>
>

Mime
View raw message