spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Leverentz (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-28225) Unexpected behavior for Window functions
Date Tue, 02 Jul 2019 01:03:00 GMT
Andrew Leverentz created SPARK-28225:
----------------------------------------

             Summary: Unexpected behavior for Window functions
                 Key: SPARK-28225
                 URL: https://issues.apache.org/jira/browse/SPARK-28225
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Andrew Leverentz


I've noticed some odd behavior when combining the "first" aggregate function with an ordered
Window.

In particular, I'm working with columns created using the syntax
{code}
first($"y", ignoreNulls = true).over(Window.orderBy($"x"))
{code}
Below, I'm including some code which reproduces this issue in a Databricks notebook.

*Code:*
{code:java}
import org.apache.spark.sql.functions.first
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType,StructField,IntegerType}

val schema = StructType(Seq(
  StructField("x", IntegerType, false),
  StructField("y", IntegerType, true),
  StructField("z", IntegerType, true)
))

val input =
  spark.createDataFrame(sc.parallelize(Seq(
    Row(101, null, 11),
    Row(102, null, 12),
    Row(103, null, 13),
    Row(203, 24, null),
    Row(201, 26, null),
    Row(202, 25, null)
  )), schema = schema)

input.show

val output = input
  .withColumn("u1", first($"y", ignoreNulls = true).over(Window.orderBy($"x".asc_nulls_last)))
  .withColumn("u2", first($"y", ignoreNulls = true).over(Window.orderBy($"x".asc)))
  .withColumn("u3", first($"y", ignoreNulls = true).over(Window.orderBy($"x".desc_nulls_last)))
  .withColumn("u4", first($"y", ignoreNulls = true).over(Window.orderBy($"x".desc)))
  .withColumn("u5", first($"z", ignoreNulls = true).over(Window.orderBy($"x".asc_nulls_last)))
  .withColumn("u6", first($"z", ignoreNulls = true).over(Window.orderBy($"x".asc)))
  .withColumn("u7", first($"z", ignoreNulls = true).over(Window.orderBy($"x".desc_nulls_last)))
  .withColumn("u8", first($"z", ignoreNulls = true).over(Window.orderBy($"x".desc)))

output.show
{code}
*Expectation:*

Based on my understanding of how ordered-Window and aggregate functions work, the results
I expected to see were:
 * u1 = u2 = constant value of 26
 * u3 = u4 = constant value of 24
 * u5 = u6 = constant value of 11
 * u7 = u8 = constant value of 13

However, columns u1, u2, u7, and u8 contain some unexpected nulls. 

*Results:*
{code:java}
+---+----+----+----+----+---+---+---+---+----+----+
|  x|   y|   z|  u1|  u2| u3| u4| u5| u6|  u7|  u8|
+---+----+----+----+----+---+---+---+---+----+----+
|203|  24|null|  26|  26| 24| 24| 11| 11|null|null|
|202|  25|null|  26|  26| 24| 24| 11| 11|null|null|
|201|  26|null|  26|  26| 24| 24| 11| 11|null|null|
|103|null|  13|null|null| 24| 24| 11| 11|  13|  13|
|102|null|  12|null|null| 24| 24| 11| 11|  13|  13|
|101|null|  11|null|null| 24| 24| 11| 11|  13|  13|
+---+----+----+----+----+---+---+---+---+----+----+
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message