spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stein Welberg <>
Subject Buffer/cache exhaustion Spark standalone inside a Docker container
Date Wed, 06 Dec 2017 16:34:11 GMT
Hi All!

I have a very weird memory issue (which is what a lot of people will most likely say ;-))
with Spark running in standalone mode inside a Docker container. Our setup is as follows:
We have a Docker container in which we have a Spring boot application that runs Spark in standalone
mode. This Spring boot app also contains a few scheduled tasks. These tasks trigger Spark
jobs. The Spark jobs scrape a SQL database shuffles the data a bit and then writes the results
to a different SQL table. Our current data set is very small (the largest table contains a
few million rows).

The problem is that the Docker host (a CentOS VM) that runs the Docker container crashes after
a while because the memory gets exhausted. I currently have limited the Spark memory usage
to 512M (I have set both executor and driver memory) and in the Spark UI I can see that the
largest job only takes about 10 MB of memory.

After digging a bit further I noticed that Spark eats up all the buffer / cache memory on
the machine. After clearing this manually by forcing Linux to drop caches (echo 2 > /proc/sys/vm/drop_caches)
(clearing the dentries and inodes) the cache usage drops considerably but if I don't keep
doing this regularly I see that the cache usage slowly keeps going up until all memory is
used in buffer/cache.

Does anyone have an idea what I might be doing wrong / what is going on here?

Big thanks in advance for any help!

Stein Welberg

View raw message