spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xinh Huynh <xinh.hu...@gmail.com>
Subject Re: How can I join two DataSet of same case class?
Date Fri, 11 Mar 2016 16:32:50 GMT
I think you have to use an alias. To provide an alias to a Dataset:

val d1 = a.as("d1")
val d2 = b.as("d2")

Then join, using the alias in the column names:
d1.joinWith(d2, $"d1.edid" === $"d2.edid")

Finally, please doublecheck your column names. I did not see "edid" in your
case class.

Xinh

On Thu, Mar 10, 2016 at 9:09 PM, 박주형 <dkdkajej@gmail.com> wrote:

> Hi. I want to join two DataSet. but below stderr is shown
>
> 16/03/11 13:55:51 WARN ColumnName: Constructing trivially true equals
> predicate, ''edid = 'edid'. Perhaps you need to use aliases.
> Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot
> resolve 'edid' given input columns dataType, avg, sigma, countUnique,
> numRows, recentEdid, categoryId, accCount, statType, categoryId, max,
> accCount, firstQuarter, recentEdid, replicationRateAvg, numRows, min,
> countNotNull, countNotNull, dcid, numDistinctRows, max, firstQuarter, min,
> replicationRateAvg, dcid, statType, avg, sigma, dataType, median,
> thirdQuarter, numDistinctRows, median, countUnique, thirdQuarter;
>
>
> my case class is
> case class Stat(statType: Int, dataType: Int, dcid: Int,
>     categoryId: Int, recentEdid: Int, countNotNull: Int, countUnique:
> Int, accCount: Int, replicationRateAvg: Double,
>     numDistinctRows: Double, numRows: Double,
>     min: Double, max: Double, sigma: Double, avg: Double,
>     firstQuarter: Double, thirdQuarter: Double, median: Double)
>
> and my code is
> a.joinWith(b, $"edid" === $"edid").show()
>
> If i use DataFrame, renaming a’s column could solve it. How can I join two
> DataSet of same case class?
>

Mime
View raw message