take a look at this pull request that is not merged yet: https://github.com/apache/spark/pull/16329 . It contains examples in Java and Scala that can be helpful. 

Anton Okolnychyi

On Jan 4, 2017 23:23, "Anil Langote" <anillangote0106@gmail.com> wrote:
Hi All,

I have been working on a use case where I have a DF which has 25 columns, 24 columns are of type string and last column is array of doubles. For a given set of columns I have to apply group by and add the array of doubles, I have implemented UDAF which works fine but it's expensive in order to tune the solution I came across Aggregators which can be implemented and used with agg function, my question is how can we implement a aggregator which takes array of doubles as input and returns the array of double.

I learned that it's not possible to implement the aggregator in Java can be done in scala only how can define the aggregator which takes array of doubles as input, note that I have parquet file as my input.

Any pointers are highly appreciated, I read that spark UDAF is slow and aggregators are the way to go.

Anil Langote