spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kumar sp <>
Subject Avoiding MUltiple GroupBy
Date Mon, 18 Feb 2019 14:34:17 GMT
Can we avoid multiple group by , l have a million records and its a
performance concern.

Below is my query , even with Windows functions also i guess it is a
performance hit, can you please advice if there is a better alternative.
I need to get max no of equipments for that house for list of dates

 ds.groupBy("house", "date").agg(countDistinct("equiId") as "count").
      drop("date").groupBy("house").agg(max("count") as "noOfEquipments")


View raw message