spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Saif.A.Ell...@wellsfargo.com>
Subject Applying a limit after orderBy of big dataframe hangs spark
Date Fri, 05 Aug 2016 18:54:05 GMT
Hi all,

I am working with a 1.5 billon rows dataframe in a small cluster and trying to apply an orderBy
operation by one of the Long Types columns.

If I limit such output to some number, say 5 millon, then trying to count, persist or store
the dataframe makes spark crash with losing executors and hang ups.
Not limiting the dataframe after the order by operation works normally, i.e. it works fine
when trying to write the 1.5 billon rows again.

Any thoughts? Using spark 1.6.0 scala 2.11

Saif


Mime
View raw message