spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From אורן שמון <oren.sha...@gmail.com>
Subject Hi all,
Date Tue, 31 Oct 2017 13:17:43 GMT
I have 2 spark jobs one is pre-process and the second is the process.
Process job needs to calculate for each user in the data.
I want  to avoid shuffle like groupBy so I think about to save the result
of the pre-process as bucket by user in Parquet or to re-partition by user
and save the result .

What is prefer ? and why
Thanks in advance,
Oren

Mime
View raw message