spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Okolnychyi <anton.okolnyc...@gmail.com>
Subject Re: Spark Aggregator for array of doubles
Date Wed, 04 Jan 2017 22:34:14 GMT
Hi,

take a look at this pull request that is not merged yet:
https://github.com/apache/spark/pull/16329 . It contains examples in Java
and Scala that can be helpful.

Best regards,
Anton Okolnychyi

On Jan 4, 2017 23:23, "Anil Langote" <anillangote0106@gmail.com> wrote:

> Hi All,
>
> I have been working on a use case where I have a DF which has 25 columns,
> 24 columns are of type string and last column is array of doubles. For a
> given set of columns I have to apply group by and add the array of doubles,
> I have implemented UDAF which works fine but it's expensive in order to
> tune the solution I came across Aggregators which can be implemented and
> used with agg function, my question is how can we implement a aggregator
> which takes array of doubles as input and returns the array of double.
>
> I learned that it's not possible to implement the aggregator in Java can
> be done in scala only how can define the aggregator which takes array of
> doubles as input, note that I have parquet file as my input.
>
> Any pointers are highly appreciated, I read that spark UDAF is slow and
> aggregators are the way to go.
>
> Best Regards,
>
> Anil Langote
>
> +1-425-633-9747
>

Mime
View raw message