spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Spark Language Integrated SQL for join on expression
Date Tue, 30 Sep 2014 01:13:24 GMT
I'll note that the DSL is pretty experimental.  That said you should be
able to do something like "user.id".attr

On Mon, Sep 29, 2014 at 3:39 PM, Benyi Wang <bewang.tech@gmail.com> wrote:

> scala> user
> res19: org.apache.spark.sql.SchemaRDD =
> SchemaRDD[0] at RDD at SchemaRDD.scala:98
> == Query Plan ==
> ParquetTableScan [id#0,name#1], (ParquetRelation
> /user/hive/warehouse/user), None
>
> scala> order
> res20: org.apache.spark.sql.SchemaRDD =
> SchemaRDD[72] at RDD at SchemaRDD.scala:98
> == Query Plan ==
> ParquetTableScan [id#8,userid#9,unit#10], (ParquetRelation
> /user/hive/warehouse/orders), None
>
> For joining SchemaRDD user and order, This will generate Ambiguous issue
> because both of tables have 'id.
>
> user.join(order, on=Some('id === 'userid))
>
> How can I specify an expression which can use SchemaRDD name and column
> together? Something might be like 'user.'id. This expression currently
> doesn't work in Spark 1.0.0 CDH 5.1.0.
>

Mime
View raw message