spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Okolnychyi <>
Subject Re: Spark Aggregator for array of doubles
Date Wed, 04 Jan 2017 22:34:14 GMT

take a look at this pull request that is not merged yet: . It contains examples in Java
and Scala that can be helpful.

Best regards,
Anton Okolnychyi

On Jan 4, 2017 23:23, "Anil Langote" <> wrote:

> Hi All,
> I have been working on a use case where I have a DF which has 25 columns,
> 24 columns are of type string and last column is array of doubles. For a
> given set of columns I have to apply group by and add the array of doubles,
> I have implemented UDAF which works fine but it's expensive in order to
> tune the solution I came across Aggregators which can be implemented and
> used with agg function, my question is how can we implement a aggregator
> which takes array of doubles as input and returns the array of double.
> I learned that it's not possible to implement the aggregator in Java can
> be done in scala only how can define the aggregator which takes array of
> doubles as input, note that I have parquet file as my input.
> Any pointers are highly appreciated, I read that spark UDAF is slow and
> aggregators are the way to go.
> Best Regards,
> Anil Langote
> +1-425-633-9747

View raw message