spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SK <>
Subject specifying fields for join()
Date Thu, 12 Jun 2014 23:25:09 GMT

I want to join 2 rdds on specific fields.

The first RDD is a set of tuples of the form: (ID, ACTION, TIMESTAMP,
The second RDD is a set of tuples of the form: (ID, TIMESTAMP).

rdd2 is a subset of rdd1. ID is a string. I want to join the two so that  I
can get the location corresponding to the timestamp values in rdd2. The join
has to be on the (ID, TIMESTAMP) fields. 
I tried  rdd1.join(rdd2), but got a compilation error.  

It appears that in Spark, the join function does not take the joining fields
as arguments and joins only on the keys.
What is the right way to do the above join?

Thanks for your help.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message