spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Gautier (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-15620) Dataset.map creates a dataset that can't be self-joined
Date Fri, 27 May 2016 19:32:13 GMT
Tim Gautier created SPARK-15620:
-----------------------------------

             Summary: Dataset.map creates a dataset that can't be self-joined
                 Key: SPARK-15620
                 URL: https://issues.apache.org/jira/browse/SPARK-15620
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.6.1
         Environment: EC2, Spark-shell
            Reporter: Tim Gautier


Given this case class and Dataset:
{code}
case class Test(id: Int)
val test = Seq(
  Test(1),
  Test(2),
  Test(3)
).toDS
{code}

'test' can be joined with itself successfully
{code}
test.as("t1").joinWith(test.as("t2"), $"t1.id" === $"t2.id").show
{code}

However, mapping 'test' like this
{code}
val testMapped = test.map(t => t.copy(id = t.id + 1))
{code}
results in a new Dataset that can't be joined to itself
{code}
testMapped.as("t1").joinWith(testMapped.as("t2"), $"t1.id" === $"t2.id").show
{code}
Yields:
{noformat}
scala> testMapped.as("t1").joinWith(testMapped.as("t2"), $"t1.id" === $"t2.id").show
org.apache.spark.sql.AnalysisException: cannot resolve 't1.id' given input columns: [id];
{noformat}

This also throws an error:
{code}
val testMapped2 = test.map(_.id)
testMapped2.as("t1").joinWith(testMapped2.as("t2"), $"t1.value" === $"t2.value").show
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message