spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: PySpark joins fail - please help
Date Sat, 18 Oct 2014 00:48:00 GMT
Hey Russell,

join() can only work with RDD of pairs (key, value), such as

rdd1:  (k, v1)
rdd2: (k, v2)

rdd1.join(rdd2) will be  (k1, v1, v2)

Spark SQL will be more useful for you, see
http://spark.apache.org/docs/1.1.0/sql-programming-guide.html

Davies


On Fri, Oct 17, 2014 at 5:01 PM, Russell Jurney <russell.jurney@gmail.com>
wrote:

> https://gist.github.com/rjurney/fd5c0110fe7eb686afc9
>
> Any way I try to join my data fails. I can't figure out what I'm doing
> wrong.
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
> ᐧ
>

Mime
View raw message