spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Anand <abhis.anan...@gmail.com>
Subject Disk Full on one Worker is leading to Job Stuck and Executor Unresponsive
Date Thu, 31 Mar 2016 18:32:15 GMT
Hi,

Why is it so that when my disk space is full on one of the workers then the
executor on that worker becomes unresponsive and the jobs on that worker
fails with the exception


16/03/29 10:49:00 ERROR DiskBlockObjectWriter: Uncaught exception while
reverting partial writes to file
/data/spark-e2fc248f-a212-4a99-9d6c-4e52d6a69070/executor-37679a6c-cb96-451e-a284-64d6b4fe9910/blockmgr-f8ca72f4-f329-468b-8e65-ef97f8fb285c/38/temp_shuffle_8f266d70-3fc6-41e5-bbaa-c413a7b08ea4
java.io.IOException: No space left on device


This is leading to my job getting stuck.

As a workaround I have to kill the executor, clear the space on disk and
new executor  relaunched by the worker and the failed stages are recomputed.


How can I get rid of this problem i.e why my job get stuck on disk full
issue on one of the workers ?


Cheers !!!
Abhi

Mime
View raw message