spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takuya Ueshin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23911) High-order function: reduce(array<T>, initialState S, inputFunction<S, T, S>, outputFunction<S, R>) → R
Date Fri, 03 Aug 2018 05:23:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567791#comment-16567791
] 

Takuya Ueshin commented on SPARK-23911:
---------------------------------------

[~smilegator] [~hvanhovell] I'd use {{aggregate}} instead of {{reduce}} for this function
name because {{aggregate}} better fits the functionality of the function. WDYT?

> High-order function: reduce(array<T>, initialState S, inputFunction<S, T, S>,
outputFunction<S, R>) → R
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-23911
>                 URL: https://issues.apache.org/jira/browse/SPARK-23911
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Xiao Li
>            Assignee: Herman van Hovell
>            Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Returns a single value reduced from array. inputFunction will be invoked for each element
in array in order. In addition to taking the element, inputFunction takes the current state,
initially initialState, and returns the new state. outputFunction will be invoked to turn
the final state into the result value. It may be the identity function (i -> i).
> {noformat}
> SELECT reduce(ARRAY [], 0, (s, x) -> s + x, s -> s); -- 0
> SELECT reduce(ARRAY [5, 20, 50], 0, (s, x) -> s + x, s -> s); -- 75
> SELECT reduce(ARRAY [5, 20, NULL, 50], 0, (s, x) -> s + x, s -> s); -- NULL
> SELECT reduce(ARRAY [5, 20, NULL, 50], 0, (s, x) -> s + COALESCE(x, 0), s -> s);
-- 75
> SELECT reduce(ARRAY [5, 20, NULL, 50], 0, (s, x) -> IF(x IS NULL, s, s + x), s ->
s); -- 75
> SELECT reduce(ARRAY [2147483647, 1], CAST (0 AS BIGINT), (s, x) -> s + x, s ->
s); -- 2147483648
> SELECT reduce(ARRAY [5, 6, 10, 20], -- calculates arithmetic average: 10.25
>               CAST(ROW(0.0, 0) AS ROW(sum DOUBLE, count INTEGER)),
>               (s, x) -> CAST(ROW(x + s.sum, s.count + 1) AS ROW(sum DOUBLE, count
INTEGER)),
>               s -> IF(s.count = 0, NULL, s.sum / s.count));
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message