spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-1881) Executor caching
Date Sat, 09 Jan 2016 13:01:39 GMT

     [ https://issues.apache.org/jira/browse/SPARK-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen resolved SPARK-1881.
------------------------------
    Resolution: Not A Problem

> Executor caching
> ----------------
>
>                 Key: SPARK-1881
>                 URL: https://issues.apache.org/jira/browse/SPARK-1881
>             Project: Spark
>          Issue Type: Improvement
>          Components: Mesos
>    Affects Versions: 1.0.0
>         Environment: centos 6.5, mesos 0.18.1
>            Reporter: nigel
>            Priority: Minor
>
> The problem is that the executor is copied for each run. We have a cluster where the
disks are of moderate size and each executor is nearly 170MB. This executor is slow to copy
and multiple runs take up a significant amount of space.
> The improvement would be to make it smaller.
> Currently the examples are included in there, which are not needed for execution. It
is easy to take them out, but it might be better to not include them in the default build.
> Another improvement might be to cache the executor jar. The script below will make a
'sparklite' executor which only downloads the jar file once (until the tmp dir is wiped).
The scripts (small) are downloaded each time as before.
> This example would need more work, the source and dest are currently hard-coded and it
might be a good idea to check file dates and or checksums in case someone was uploading jars
with the same version.
> This might be a bit redundant, depending on what happens with other work on executor
caching.
> Comments welcome.
> --------------------------
> mkdir sparklite
> echo '58c58
> <   if [ -f "$FWDIR/RELEASE" ]; then
> ---
> >   if [ -f "$FWDIR/RELEASE" ] && [ -f "$FWDIR"/lib/spark-assembly*hadoop*.jar
]; then
> 60c60
> <   else
> ---
> >   elif [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*.jar ]; then
> 61a62,68
> >   else
> > #Try the local one. If not there, download from hdfs
> >     if [ ! -f /tmp/sparklite/spark-assembly*hadoop*.jar ]; then
> >         mkdir /tmp/sparklite 2>/dev/null
> >         hdfs dfs -get /spark/spark-assembly*-hadoop*.jar /tmp/sparklite/
> >     fi    
> >     ASSEMBLY_JAR=$(ls /tmp/sparklite/spark-assembly*hadoop*.jar 2>/dev/null)
> 64a72
> > ' > cc.patch
> tar -C sparklite -xf spark-1.0.0.tgz 
> cd sparklite
> hdfs dfs -put ./spark-1.0.0/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop2.4.0.jar /spark/
> rm -f spark-1.0.0/lib/*assembly*
> rm -f spark-1.0.0/lib/*example*
> rm -f spark-1.0.0/bin/*.cmd
> rm -rf spark-1.0.0/ec2
> rm -rf spark-1.0.0/lib
> rm -rf spark-1.0.0/conf
> rm -rf spark-1.0.0/examples
> patch spark-1.0.0/bin/compute-classpath.sh < cc.patch
> rm -f spark-1.0.0.tgz
> tar zcf spark-1.0.0.tgz spark-1.0.0
> hdfs dfs -rm /spark/spark-1.0.0.tgz
> hdfs dfs -put ./spark-1.0.0.tgz /spark/
> ------------------------



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message