spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Vykhodtsev <>
Subject pyspark.GroupedData.agg works incorrectly when one column is aggregated twice?
Date Fri, 27 May 2016 11:28:00 GMT
Dear list,

I am trying to calculate sum and count on the same column:

user_id_books_clicks =

If I do it like that, it only gives me one (last) aggregate -

But if I change to .agg({'user_id':'count', 'is_booking':'sum'})  -  it
gives me both. I am on 1.6.1. Is it fixed in 2.+? Or should I report it to

View raw message