spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: .intersection() method on RDDs?
Date Fri, 24 Jan 2014 01:39:33 GMT
...or `keys` instead of `map(_._1)`.


On Thu, Jan 23, 2014 at 5:36 PM, Evan R. Sparks <evan.sparks@gmail.com>wrote:

> Yup (well, with _._1 at the end!)
>
>
> On Thu, Jan 23, 2014 at 5:28 PM, Andrew Ash <andrew@andrewash.com> wrote:
>
>> You're thinking like this?
>>
>> A.map(v => (v,None)).join(B.map(v => (v,None))).map(_._2)
>>
>>
>> On Thu, Jan 23, 2014 at 6:26 PM, Evan R. Sparks <evan.sparks@gmail.com>wrote:
>>
>>> You could map each to an RDD[(String,None)] and do a join.
>>>
>>>
>>> On Thu, Jan 23, 2014 at 5:18 PM, Andrew Ash <andrew@andrewash.com>wrote:
>>>
>>>> Hi spark users,
>>>>
>>>> I recently wanted to calculate the set intersection of two RDDs of
>>>> Strings.  I couldn't find a .intersection() method in the autocomplete or
>>>> in the Scala API docs, so used a little set theory to end up with this:
>>>>
>>>> lazy val A = ...
>>>> lazy val B = ...
>>>> A.union(B).subtract(A.subtract(B)).subtract(B.subtract(A))
>>>>
>>>> Which feels very cumbersome.
>>>>
>>>> Does anyone have a more idiomatic way to calculate intersection?
>>>>
>>>> Thanks!
>>>> Andrew
>>>>
>>>
>>>
>>
>

Mime
View raw message