spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 崔苗 <cuim...@danale.com>
Subject Fw:multiple group by action
Date Sat, 25 Aug 2018 02:55:07 GMT







-------- Forwarding messages --------
From: "崔苗" <cuimiao@danale.com>
Date: 2018-08-25 10:54:31
To: dev@spark.apache.org
Subject: multiple group by action
Hi,
we have some user data with columns(userId,company,client,country,region,city),
now we want to count userId by multiple column,such as :
select count(distinct userId) group by company
select count(distinct userId) group by company,client
select count(distinct userId) group by company,client,country
select count(distinct userId) group by company,client,country,region
etc
so each action will bring a shuffle stage, as for columns( company,client) contain column
company,
Is there a way to reduce shuffle stage?


Thanks for any replys










Mime
View raw message