spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-9435) Java UDFs don't work with GROUP BY expressions
Date Tue, 10 Jan 2017 04:24:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813832#comment-15813832
] 

Hyukjin Kwon commented on SPARK-9435:
-------------------------------------

This sill happens in the current master - 

{code}
val df = Seq((1, 10), (2, 11), (3, 12)).toDF("x", "y")
val udf = new UDF1[Int, Int]  {
  override def call(i: Int): Int = i + 1
}

spark.udf.register("inc", udf, IntegerType)
df.createOrReplaceTempView("tmp")
spark.sql("SELECT inc(y) FROM tmp GROUP BY inc(y)").show()
{code}

I tested both Scala and Java ones. and I believe the above one is simpler Scala one to reproduce
the same issue.

> Java UDFs don't work with GROUP BY expressions
> ----------------------------------------------
>
>                 Key: SPARK-9435
>                 URL: https://issues.apache.org/jira/browse/SPARK-9435
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1
>         Environment: All
>            Reporter: James Aley
>         Attachments: IncMain.java, points.txt
>
>
> If you define a UDF in Java, for example by implementing the UDF1 interface, then try
to use that UDF on a column in both the SELECT and GROUP BY clauses of a query, you'll get
an error like this:
> {code}
> "SELECT inc(y),COUNT(DISTINCT x) FROM test_table GROUP BY inc(y)"
> org.apache.spark.sql.AnalysisException: expression 'y' is neither present in the group
by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care
which value you get.
> {code}
> We put together a minimal reproduction in the attached Java file, which makes use of
the data in the text file attached.
> I'm guessing there's some kind of issue with the equality implementation, so Spark can't
tell that those two expressions are the same maybe? If you do the same thing from Scala, it
works fine.
> Note for context: we ran into this issue while working around SPARK-9338.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message