spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Herman van Hövell tot Westerflier <hvanhov...@questtec.nl>
Subject Re: Lead operator not working as aggregation operator
Date Mon, 02 Nov 2015 10:45:58 GMT
Hi,

This is more a question for the User list.

Lead and Lag imply ordering of the whole dataset, and this is not
supported. You can use Lead/Lag in an ordered window function and you'll be
fine:

*select lead(max(expenses)) over (order by customerId) from tbl group by
customerId*

HTH

Met vriendelijke groet/Kind regards,

Herman van Hövell tot Westerflier

QuestTec B.V.
Torenwacht 98
2353 DC Leiderdorp
hvanhovell@questtec.nl
+31 6 420 590 27


2015-11-02 11:33 GMT+01:00 Shagun Sodhani <sshagunsodhani@gmail.com>:

> Hi! I was trying out window functions in SparkSql (using hive context)
> and I noticed that while this
> <https://issues.apache.org/jira/browse/TAJO-919?jql=text%20~%20%22lag%20window%22>
> mentions that *lead* is implemented as an aggregate operator, it seems
> not to be the case.
>
> I am using the following configuration:
>
> Query : SELECT lead(max(`expenses`)) FROM `table` GROUP BY `customerId`
> Spark Version: 10.4
> SparkSql Version: 1.5.1
>
> I am using the standard example of (`customerId`, `expenses`) scheme where
> each customer has multiple values for expenses (though I am setting age as
> Double and not Int as I am trying out maths functions).
>
>
> *java.lang.NullPointerException at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFLeadLag.evaluate(GenericUDFLeadLag.java:57)*
>
> The entire error stack can be found here <http://pastebin.com/jTRR4Ubx>.
>
> Can someone confirm if this is an actual issue or some oversight on my
> part?
>
> Thanks!
>

Mime
View raw message