+1 for using S3A.

It would also depend on what format you're using. I agree with Steve that Parquet, for instance, is a good option. If you're using plain text files, some people use GZ files but they cannot be partitioned, thus putting a lot of pressure on the driver. It doesn't look like this is the issue you're running into, though, because it would not be a progressive slow down, but please provide as much detail as possible about your app.

The cache could be an issue but the OOM would come from an executor, not from the driver. 

From what you're saying, Keith, it indeed looks like some memory is not being freed. Seeing the code would help. If you can, also send all the logs (with Spark at least in INFO level).


On Fri, Nov 18, 2016 at 10:08 AM, Nathan Lande <nathanlande@gmail.com> wrote:

+1 to not threading.

What does your load look like? If you are loading many files and cacheing them in N rdds rather than 1 rdd this could be an issue.

If the above two things don't fix your oom issue, without knowing anything else about your job, I would focus on your cacheing strategy as a potential culprit. Try running without any cacheing to isolate the issue; bad cacheing strategy is the source of oom issues for me most of the time.

On Nov 18, 2016 6:31 AM, "Keith Bourgoin" <keith@parsely.com> wrote:
Hi Alexis,

Thanks for the response. I've been working with Irina on trying to sort this issue out.

We thread the file processing to amortize the cost of things like getting files from S3. It's a pattern we've seen recommended in many places, but I don't have any of those links handy.  The problem isn't the threading, per se, but clearly some sort of memory leak in the driver itself.  Each file is a self-contained unit of work, so once it's done all memory related to it should be freed. Nothing in the script itself grows over time, so if it can do 10 concurrently, it should be able to run like that forever.

I've hit this same issue working on another Spark app which wasn't threaded, but produced tens of thousands of jobs. Eventually, the Spark UI would get slow, then unresponsive, and then be killed due to OOM.

I'll try to cook up some examples of this today, threaded and not. We were hoping that someone had seen this before and it rung a bell. Maybe there's a setting to clean up info from old jobs that we can adjust.



On Thu, Nov 17, 2016 at 9:50 PM Alexis Seigneurin <aseigneurin@ipponusa.com> wrote:
Hi Irina,

I would question the use of multiple threads in your application. Since Spark is going to run the processing of each DataFrame on all the cores of your cluster, the processes will be competing for resources. In fact, they would not only compete for CPU cores but also for memory.

Spark is designed to run your processes in a sequence, and each process will be run in a distributed manner (multiple threads on multiple instances). I would suggest to follow this principle.

Feel free to share to code if you can. It's always helpful so that we can give better advice.


On Thu, Nov 17, 2016 at 8:51 PM, Irina Truong <irina@parsely.com> wrote:

We have an application that reads text files, converts them to dataframes, and saves them in Parquet format. The application runs fine when processing a few files, but we have several thousand produced every day. When running the job for all files, we have spark-submit killed on OOM:

# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 27226"...

The job is written in Python. We’re running it in Amazon EMR 5.0 (Spark 2.0.0) with spark-submit. We’re using a cluster with a master c3.2xlarge instance (8 cores and 15g of RAM) and 3 core c3.4xlarge instances (16 cores and 30g of RAM each). Spark config settings are as follows:

('spark.serializer', 'org.apache.spark.serializer.KryoSerializer'),

('spark.executors.instances', '3'),

('spark.yarn.executor.memoryOverhead', '9g'),

('spark.executor.cores', '15'),

('spark.executor.memory', '12g'),

('spark.scheduler.mode', 'FIFO'),

('spark.cleaner.ttl', '1800'),

The job processes each file in a thread, and we have 10 threads running concurrently. The process will OOM after about 4 hours, at which point Spark has processed over 20,000 jobs.

It seems like the driver is running out of memory, but each individual job is quite small. Are there any known memory leaks for long-running Spark applications on Yarn?

Alexis Seigneurin
Managing Consultant

Alexis Seigneurin
Managing Consultant