spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Saif.A.Ell...@wellsfargo.com>
Subject RE: Applying a limit after orderBy of big dataframe hangs spark
Date Fri, 05 Aug 2016 20:00:07 GMT
Hi thanks for the assistance,


1.       Standalone

2.       df.orderBy(field).limit(5000).write.parquet(...)

From: Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
Sent: Friday, August 05, 2016 4:33 PM
To: Ellafi, Saif A.
Cc: user @spark
Subject: Re: Applying a limit after orderBy of big dataframe hangs spark

Hi,

  1.  What scheduling are you using standalone, yarn etc?
  2.  How arte you limiting the df output?

HTH




Dr Mich Talebzadeh



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content
is explicitly disclaimed. The author will in no case be liable for any monetary damages arising
from such loss, damage or destruction.



On 5 August 2016 at 19:54, <Saif.A.Ellafi@wellsfargo.com<mailto:Saif.A.Ellafi@wellsfargo.com>>
wrote:
Hi all,

I am working with a 1.5 billon rows dataframe in a small cluster and trying to apply an orderBy
operation by one of the Long Types columns.

If I limit such output to some number, say 5 millon, then trying to count, persist or store
the dataframe makes spark crash with losing executors and hang ups.
Not limiting the dataframe after the order by operation works normally, i.e. it works fine
when trying to write the 1.5 billon rows again.

Any thoughts? Using spark 1.6.0 scala 2.11

Saif


Mime
View raw message