spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-26082) Misnaming of spark.mesos.fetch(er)Cache.enable in MesosClusterScheduler
Date Thu, 07 Feb 2019 09:18:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16762499#comment-16762499
] 

Dongjoon Hyun edited comment on SPARK-26082 at 2/7/19 9:17 AM:
---------------------------------------------------------------

Since this bug is introduced by SPARK-15994 which is added Spark 2.1.0, I removed 2.0.x from
the affected versions.

BTW, Spark 2.2.x is EOL (https://spark.apache.org/versioning-policy.html).


was (Author: dongjoon):
Since this bug is introduced by SPARK-15994 which is added Spark 2.1.0, I removed 2.0.x from
the affected versions.

> Misnaming of spark.mesos.fetch(er)Cache.enable in MesosClusterScheduler
> -----------------------------------------------------------------------
>
>                 Key: SPARK-26082
>                 URL: https://issues.apache.org/jira/browse/SPARK-26082
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos
>    Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.3.2
>            Reporter: Martin Loncaric
>            Priority: Major
>
> Currently in [docs|https://spark.apache.org/docs/latest/running-on-mesos.html]:
> {quote}spark.mesos.fetcherCache.enable / false / If set to `true`, all URIs (example:
`spark.executor.uri`, `spark.mesos.uris`) will be cached by the Mesos Fetcher Cache
> {quote}
> Currently in {{MesosClusterScheduler.scala}} (which passes parameter to driver):
> {{private val useFetchCache = conf.getBoolean("spark.mesos.fetchCache.enable", false)}}
> Currently in {{MesosCourseGrainedSchedulerBackend.scala}} (which passes mesos caching
parameter to executors):
> {{private val useFetcherCache = conf.getBoolean("spark.mesos.fetcherCache.enable", false)}}
> This naming discrepancy dates back to version 2.0.0 ([jira|http://mail-archives.apache.org/mod_mbox/spark-issues/201606.mbox/%3CJIRA.12979909.1466099309000.9921.1466101026233@Atlassian.JIRA%3E]).
> This means that when {{spark.mesos.fetcherCache.enable=true}} is specified, the Mesos
cache will be used only for executors, and not for drivers.
> IMPACT:
> Not caching these driver files (typically including at least spark binaries, custom jar,
and additional dependencies) adds considerable overhead network traffic and startup time when
frequently running spark Applications on a Mesos cluster. Additionally, since extracted files
like {{spark-x.x.x-bin-*.tgz}} are additionally copied and left in the sandbox with the cache
off (rather than extracted directly without an extra copy), this can considerably increase
disk usage. Users CAN currently workaround by specifying the {{spark.mesos.fetchCache.enable}}
option, but this should at least be specified in the documentation.
> SUGGESTED FIX:
> Add {{spark.mesos.fetchCache.enable}} to the documentation for versions 2 - 2.4, and
update {{MesosClusterScheduler.scala}} to use {{spark.mesos.fetcherCache.enable}} going forward
(literally a one-line change).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message