spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Petar Zečević <>
Subject Re: Array indexing functions
Date Thu, 07 Feb 2019 08:19:48 GMT

as far as I know these are not standard functions.

Writing UDFs is easy, but only in Java and Scala is it equally efficient as a built-in function.
When using Python, data movement/conversion to/from Arrow is still necessary, and that makes
a difference in performance. That was the motivation behind these two.

I'd object to the rule of not implementing functions not found anywhere else, but there seems
to be a consensus around this, so I'll just close the JIRA.


Sean Owen <> writes:

> Is it standard SQL or implemented in Hive? Because UDFs are so relatively easy in Spark
we don't need tons of builtins like an RDBMS does. 
> On Tue, Feb 5, 2019, 7:43 AM Petar Zečević < wrote:
>  Hi everybody,
>  I finally created the JIRA ticket and the pull request for the two array indexing functions:
>  Can any of the committers please check it out?
>  Thanks,
>  Petar
>  Petar Zečević <> writes:
>  > Hi,
>  > I implemented two array functions that are useful to us and I wonder if you think
it would be useful to add them to the distribution. The functions are used for filtering arrays
based on indexes:
>  >
>  > array_allpositions (named after array_position) - takes a column and a value and
returns an array of the column's indexes corresponding to elements equal to the provided value
>  >
>  > array_select - takes an array column and an array of indexes and returns a subset
of the array based on the provided indexes.
>  >
>  > If you agree with this addition I can create a JIRA ticket and a pull request.
>  ---------------------------------------------------------------------
>  To unsubscribe e-mail:

To unsubscribe e-mail:

View raw message