spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tarun Garg <>
Subject YARN worker out of disk memory
Date Fri, 26 Jun 2015 17:41:20 GMT

I am running a spark job over yarn, after 2-3 hr execution workers start
dieing and i found that a lot of file at
named temp_shuffle. 
my job is on this i have three different
1 is foreachRDD()
2 is mapToPair().reduceByKey().foreachRDD()
3 is flatMapToPair().groupByKeyAndWindow().map().foreachRDD()

in this i am shuffling two times but the question 1. after each shuffle data
deleted from the disk, this is what my understanding then why it is not
getting deleted this time? 2. as i am configuring the environment so kill
the process very often is that leaves the data over the disk?

any thoughts on this.


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message