spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Papa (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-29321) Possible memory leak in Spark
Date Tue, 01 Oct 2019 20:26:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-29321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942293#comment-16942293
] 

George Papa commented on SPARK-29321:
-------------------------------------

I run the code in the snippet (I test it without any sleeping time, in order to see the results
faster) and I have recorded the JVM memory usage for approximately 1 hour between Spark 2.4.4
and your branch with your patch.
 Spark JVM memory with Spark 2.4.4:
||Time||RES||SHR||MEM%||
|1min|{color:#de350b}1349{color}|32724|1.5|
|3min|{color:#de350b}1936{color}|32724|2.2|
|5min|{color:#de350b}2506{color}|32724|2.6|
|7min|{color:#de350b}2564{color}|32724|2.7|
|9min|{color:#de350b}2584{color}|32724|2.7|
|11min|{color:#de350b}2585{color}|32724|2.7|
|13min|{color:#de350b}2592{color}|32724|2.7|
|15min|{color:#de350b}2591{color}|32724|2.7|
|17min|{color:#de350b}2591{color}|32724|2.7|
|30min|{color:#de350b}2600{color}|32724|2.7|
|1h|{color:#de350b}2618{color}|32724|2.7|

 

Spark JVM memory with Spark patch([GitHub Pull Request #25973|https://github.com/apache/spark/pull/25973])
||Time||RES||SHR||MEM%||
|1min|{color:#de350b}1134{color}|25380|1.4|
|3min|{color:#de350b}1520{color}|25380|1.6|
|5min|{color:#de350b}1570{color}|25380|1.6|
|7min|{color:#de350b}1598{color}|25380|1.7|
|9min|{color:#de350b}1613{color}|25380|1.7|
|11min|{color:#de350b}1616{color}|25380|1.7|
|15min|{color:#de350b}1620{color}|25380|1.7|
|17min|{color:#de350b}1625{color}|25380|1.7|
|30min|{color:#de350b}1629{color}|25380|1.7|
|1h|{color:#de350b}1660{color}|25380|1.7|

 

As you can see the RES memory is slightly increasing in both cases overtime. Also, when I
tested with a real streaming application in a testing env after hours, the persisted dataframes
overflows the memory and spill to disk.

*NOTE:* You can easily reproduce the above behavior, by running the snippet code (I prefer
to run without any sleeping delay) and track the JVM memory with top or htop command.

 

> Possible memory leak in Spark
> -----------------------------
>
>                 Key: SPARK-29321
>                 URL: https://issues.apache.org/jira/browse/SPARK-29321
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.3
>            Reporter: George Papa
>            Priority: Major
>
> I used Spark 2.1.1 and I upgraded into new versions. After Spark version 2.3.3,  I observed
from Spark UI that the driver memory is{color:#ff0000} increasing continuously.{color}
> In more detail, the driver memory and executors memory have the same used memory storage
and after each iteration the storage memory is increasing. You can reproduce this behavior
by running the following snippet code. The following example, is very simple, without any
dataframe persistence, but the memory consumption is not stable as it was in former Spark
versions (Specifically until Spark 2.3.2).
> Also, I tested with Spark streaming and structured streaming API and I had the same behavior.
I tested with an existing application which reads from Kafka source and do some aggregations,
persist dataframes and then unpersist them. The persist and unpersist it works correct, I
see the dataframes in the storage tab in Spark UI and after the unpersist, all dataframe have
removed. But, after the unpersist the executors memory is not zero, BUT has the same value
with the driver memory. This behavior also affects the application performance because the
memory of the executors is increasing as the driver increasing and after a while the persisted
dataframes are not fit in the executors memory and  I have spill to disk.
> Another error which I had after a long running, was {color:#ff0000}java.lang.OutOfMemoryError:
GC overhead limit exceeded, but I don't know if its relevant with the above behavior or not.{color}
>  
> *HOW TO REPRODUCE THIS BEHAVIOR:*
> Create a very simple application(streaming count_file.py) in order to reproduce this
behavior. This application reads CSV files from a directory, count the rows and then remove
the processed files.
> {code:java}
> import time
> import os
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> target_dir = "..."
> spark=SparkSession.builder.appName("DataframeCount").getOrCreate()
> while True:
>     for f in os.listdir(target_dir):
>         df = spark.read.load(target_dir + f, format="csv")
>         print("Number of records: {0}".format(df.count()))
>         time.sleep(15){code}
> Submit code:
> {code:java}
> spark-submit 
> --master spark://xxx.xxx.xx.xxx
> --deploy-mode client
> --executor-memory 4g
> --executor-cores 3
> streaming count_file.py
> {code}
>  
> *TESTED CASES WITH THE SAME BEHAVIOUR:*
>  * I tested with default settings (spark-defaults.conf)
>  * Add spark.cleaner.periodicGC.interval 1min (or less)
>  * {{Turn spark.cleaner.referenceTracking.blocking}}=false
>  * Run the application in cluster mode
>  * Increase/decrease the resources of the executors and driver
>  * I tested with extraJavaOptions in driver and executor -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35
-XX:ConcGCThreads=12
>   
> *DEPENDENCIES*
>  * Operation system: Ubuntu 16.04.3 LTS
>  * Java: jdk1.8.0_131 (tested also with jdk1.8.0_221)
>  * Python: Python 2.7.12
>  
> *NOTE:* In Spark 2.1.1 the driver memory consumption (Storage Memory tab) was extremely
low and after the run of ContextCleaner and BlockManager the memory was decreasing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message