spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yohann jardin <>
Subject Re: DataFrame multiple agg on the same column
Date Sat, 07 Oct 2017 17:33:14 GMT
Hey Somasundaram,

Using a map is only one way to use the function agg. For the complete list:

Using the first one: agg<,%20org.apache.spark.sql.Column...%29>(Column<>
expr, Column<>...
grouped_txn.agg(count(lit(1)), sum('amount), max('amount), min('create_time), max('created_time)).show

Yohann Jardin

Le 10/7/2017 à 7:12 PM, Somasundaram Sekar a écrit :

I have a GroupedData object, on which I perform aggregation of few columns since GroupedData
takes in map, I cannot perform multiple aggregate on the same column, say I want to have both
max and min of amount.

So the below line of code will return only one aggregate per column

grouped_txn.agg({'*' : 'count', 'amount' : 'sum', 'amount' : 'max', 'created_time' : 'min',
'created_time' : 'max'})

What are the possible alternatives, I can have a new column defined, that is just a copy of
the original and use that, but that looks ugly any suggestions?

Somasundaram S

View raw message