spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abhishek Modi (Jira)" <j...@apache.org>
Subject [jira] [Created] (SPARK-29472) Mechanism for Excluding Jars at Launch for YARN
Date Tue, 15 Oct 2019 02:12:00 GMT
Abhishek Modi created SPARK-29472:
-------------------------------------

             Summary: Mechanism for Excluding Jars at Launch for YARN
                 Key: SPARK-29472
                 URL: https://issues.apache.org/jira/browse/SPARK-29472
             Project: Spark
          Issue Type: New Feature
          Components: YARN
    Affects Versions: 2.4.4
            Reporter: Abhishek Modi


*Summary*

It would be convenient if there were an easy way to exclude jars from Spark’s classpath
at launch time. This would complement the way in which jars can be added to the classpath
using {{extraClassPath}}.

 

*Context*

The Spark build contains its dependency jars in the {{/jars}} directory. These jars become
part of the executor’s classpath. By default on YARN, these jars are packaged and distributed
to containers at launch ({{spark-submit}}) time.

 

While developing Spark applications, customers sometimes need to debug using different versions
of dependencies. This can become difficult if the dependency (eg. Parquet 1.11.0) is one that
Spark already has in {{/jars}} (eg. Parquet 1.10.1 in Spark 2.4), as the dependency included
with Spark is preferentially loaded. 

 

Configurations such as {{userClassPathFirst}} are available. However these have often come
with other side effects. For example, if the customer’s build includes Avro they will likely
see {{Caused by: java.lang.LinkageError: loader constraint violation: when resolving method
"org.apache.spark.SparkConf.registerAvroSchemas(Lscala/collection/Seq;)Lorg/apache/spark/SparkConf;"
the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current
class, com/uber/marmaray/common/spark/SparkFactory, and the class loader (instance of sun/misc/Launcher$AppClassLoader)
for the method's defining class, org/apache/spark/SparkConf, have different Class objects
for the type scala/collection/Seq used in the signature}}. Resolving such issues often takes
many hours.

 

To deal with these sorts of issues, customers often download the Spark build, remove the target
jars and then do spark-submit. Other times, customers may not be able to do spark-submit as
it is gated behind some Spark Job Server. In this case, customers may try downloading the
build, removing the jars, and then using configurations such as {{spark.yarn.dist.jars}} or
{{spark.yarn.dist.archives}}. Both of these options are undesirable as they are very operationally
heavy, error prone and often result in the customer’s spark builds going out of sync with
the authoritative build. 

 

*Solution*

I’d like to propose adding a {{spark.yarn.jars.exclusionRegex}} configuration. Customers
could provide a regex such as {{.\*parquet.\*}} and jar files matching this regex would not
be included in the driver and executor classpath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message