spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gustavo Enrique Salazar Torres <gsala...@ime.usp.br>
Subject Problem when sorting big file
Date Thu, 15 May 2014 21:55:19 GMT
Hi there:

I have this dataset (about 12G) which I need to sort by key.
I used the sortByKey method but when I try to save the file to disk (HDFS
in this case) it seems that some tasks run out of time because they have
too much data to save and it can't fit in memory.
I say this because before the TimeOut exception at the worker there is an
OOM exception from an specific task.
My question is: is this a common problem at Spark? has anyone been through
this issue?
The cause of the problem seems to be an unbalanced distribution of data
between tasks.

I will appreciate any help.

Thanks
Gustavo

Mime
View raw message