spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Cheah (JIRA)" <>
Subject [jira] [Commented] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes
Date Thu, 06 Sep 2018 23:21:00 GMT


Matt Cheah commented on SPARK-25262:

For [] we allow using tmpfs but other volume types
aren't allowed. Is it fine to close this issue or do we want to keep this open to track work
to support other volume types there?

> Make Spark local dir volumes configurable with Spark on Kubernetes
> ------------------------------------------------------------------
>                 Key: SPARK-25262
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 2.3.0, 2.3.1
>            Reporter: Rob Vesse
>            Priority: Major
> As discussed during review of the design document for SPARK-24434 while providing pod
templates will provide more in-depth customisation for Spark on Kubernetes there are some
things that cannot be modified because Spark code generates pod specs in very specific ways.
> The particular issue identified relates to handling on {{spark.local.dirs}} which is
done by {{LocalDirsFeatureStep.scala}}.  For each directory specified, or a single default
if no explicit specification, it creates a Kubernetes {{emptyDir}} volume.  As noted in the
Kubernetes documentation this will be backed by the node storage (
 In some compute environments this may be extremely undesirable.  For example with diskless
compute resources the node storage will likely be a non-performant remote mounted disk, often
with limited capacity.  For such environments it would likely be better to set {{medium: Memory}}
on the volume per the K8S documentation to use a {{tmpfs}} volume instead.
> Another closely related issue is that users might want to use a different volume type
to back the local directories and there is no possibility to do that.
> Pod templates will not really solve either of these issues because Spark is always going
to attempt to generate a new volume for each local directory and always going to set these
as {{emptyDir}}.
> Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:
> * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} volumes
> * Modify the logic to check if there is a volume already defined with the name and if
so skip generating a volume definition for it

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message