spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes
Date Fri, 28 Sep 2018 19:58:01 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16632491#comment-16632491
] 

Apache Spark commented on SPARK-25262:
--------------------------------------

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/22584

> Make Spark local dir volumes configurable with Spark on Kubernetes
> ------------------------------------------------------------------
>
>                 Key: SPARK-25262
>                 URL: https://issues.apache.org/jira/browse/SPARK-25262
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 2.3.0, 2.3.1
>            Reporter: Rob Vesse
>            Priority: Major
>
> As discussed during review of the design document for SPARK-24434 while providing pod
templates will provide more in-depth customisation for Spark on Kubernetes there are some
things that cannot be modified because Spark code generates pod specs in very specific ways.
> The particular issue identified relates to handling on {{spark.local.dirs}} which is
done by {{LocalDirsFeatureStep.scala}}.  For each directory specified, or a single default
if no explicit specification, it creates a Kubernetes {{emptyDir}} volume.  As noted in the
Kubernetes documentation this will be backed by the node storage (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).
 In some compute environments this may be extremely undesirable.  For example with diskless
compute resources the node storage will likely be a non-performant remote mounted disk, often
with limited capacity.  For such environments it would likely be better to set {{medium: Memory}}
on the volume per the K8S documentation to use a {{tmpfs}} volume instead.
> Another closely related issue is that users might want to use a different volume type
to back the local directories and there is no possibility to do that.
> Pod templates will not really solve either of these issues because Spark is always going
to attempt to generate a new volume for each local directory and always going to set these
as {{emptyDir}}.
> Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:
> * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} volumes
> * Modify the logic to check if there is a volume already defined with the name and if
so skip generating a volume definition for it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message