spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anil Langote <>
Subject Spark Aggregator for array of doubles
Date Wed, 04 Jan 2017 22:23:15 GMT
Hi All,

I have been working on a use case where I have a DF which has 25 columns, 24 columns are of
type string and last column is array of doubles. For a given set of columns I have to apply
group by and add the array of doubles, I have implemented UDAF which works fine but it's expensive
in order to tune the solution I came across Aggregators which can be implemented and used
with agg function, my question is how can we implement a aggregator which takes array of doubles
as input and returns the array of double.

I learned that it's not possible to implement the aggregator in Java can be done in scala
only how can define the aggregator which takes array of doubles as input, note that I have
parquet file as my input.

Any pointers are highly appreciated, I read that spark UDAF is slow and aggregators are the
way to go.

Best Regards,
Anil Langote
View raw message