spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: How can I apply such an inner join in Spark Scala/Python
Date Mon, 17 Nov 2014 18:24:54 GMT
Just RDD.join() should be an inner join.

On Mon, Nov 17, 2014 at 5:51 PM, Blind Faith <person.of.book@gmail.com> wrote:
> So let us say I have RDDs A and B with the following values.
>
> A = [ (1, 2), (2, 4), (3, 6) ]
>
> B = [ (1, 3), (2, 5), (3, 6), (4, 5), (5, 6) ]
>
> I want to apply an inner join, such that I get the following as a result.
>
> C = [ (1, (2, 3)), (2, (4, 5)), (3, (6,6)) ]
>
> That is, those keys which are not present in A should disappear after the
> left inner join.
>
> How can I achieve that? I can see outerJoin functions but no innerJoin
> functions in the Spark RDD class.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message