spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Best practice to avoid ambiguous columns in DataFrame.join
Date Fri, 15 May 2015 22:55:56 GMT
There are several ways to solve this ambiguity:

*1. use the DataFrames to get the attribute so its already "resolved" and
not just a string we need to map to a DataFrame.*

df.join(df2, df("_1") === df2("_1"))

*2. Use aliases*

df.as('a).join(df2.as('b), $"a._1" === $"b._1")

*3. rename the columns as you suggested.*

df.join(df2.withColumnRenamed("_1", "right_key"), $"_1" ===
$"right_key").printSchema

*4. (Spark 1.4 only) use def join(right: DataFrame, usingColumn: String):
DataFrame*

df.join(df1, "_1")

This has the added benefit of only outputting a single _1 column.

On Fri, May 15, 2015 at 3:44 PM, Justin Yip <yipjustin@prediction.io> wrote:

> Hello,
>
> I would like ask know if there are recommended ways of preventing
> ambiguous columns when joining dataframes. When we join dataframes, it
> usually happen we join the column with identical name. I could have rename
> the columns on the right data frame, as described in the following code. Is
> there a better way to achieve this?
>
> scala> val df = sqlContext.createDataFrame(Seq((1, "a"), (2, "b"), (3,
> "b"), (4, "b")))
> df: org.apache.spark.sql.DataFrame = [_1: int, _2: string]
>
> scala> val df2 = sqlContext.createDataFrame(Seq((1, 10), (2, 20), (3, 30),
> (4, 40)))
> df2: org.apache.spark.sql.DataFrame = [_1: int, _2: int]
>
> scala> df.join(df2.withColumnRenamed("_1", "right_key"), $"_1" ===
> $"right_key").printSchema
>
> Thanks.
>
> Justin
>
> ------------------------------
> View this message in context: Best practice to avoid ambiguous columns in
> DataFrame.join
> <http://apache-spark-user-list.1001560.n3.nabble.com/Best-practice-to-avoid-ambiguous-columns-in-DataFrame-join-tp22907.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Mime
View raw message