spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Yip <yipjus...@prediction.io>
Subject Best practice to avoid ambiguous columns in DataFrame.join
Date Fri, 15 May 2015 22:44:40 GMT
Hello,

I would like ask know if there are recommended ways of preventing ambiguous
columns when joining dataframes. When we join dataframes, it usually happen
we join the column with identical name. I could have rename the columns on
the right data frame, as described in the following code. Is there a better
way to achieve this?

scala> val df = sqlContext.createDataFrame(Seq((1, "a"), (2, "b"), (3,
"b"), (4, "b")))
df: org.apache.spark.sql.DataFrame = [_1: int, _2: string]

scala> val df2 = sqlContext.createDataFrame(Seq((1, 10), (2, 20), (3, 30),
(4, 40)))
df2: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> df.join(df2.withColumnRenamed("_1", "right_key"), $"_1" ===
$"right_key").printSchema

Thanks.

Justin




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Best-practice-to-avoid-ambiguous-columns-in-DataFrame-join-tp22907.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Mime
View raw message