spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-28294) Support `spark.history.fs.cleaner.maxNum` configuration
Date Mon, 08 Jul 2019 06:58:00 GMT
Dongjoon Hyun created SPARK-28294:
-------------------------------------

             Summary: Support `spark.history.fs.cleaner.maxNum` configuration
                 Key: SPARK-28294
                 URL: https://issues.apache.org/jira/browse/SPARK-28294
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.0.0
            Reporter: Dongjoon Hyun


Up to now, Apache Spark maintains the event log directory by time policy, `spark.history.fs.cleaner.maxAge`.
However, there are two issues.

1. Some file system has a limitation on the maximum number of files in a single directory.
For example, HDFS `dfs.namenode.fs-limits.max-directory-items` is 1024 * 1024 by default.
- https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

2. Spark is sometimes unable to to clean up some old log files due to permission issues. 

To handle both (1) and (2), this issue aims to support an additional number policy configuration
for the event log directory, `spark.history.fs.cleaner.maxNum`. Spark can try to keep the
number of files in the event log directory according to this policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message