spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artemis User <arte...@dtechspace.com>
Subject Re: Submitting extra jars on spark applications on yarn with cluster mode
Date Sat, 14 Nov 2020 15:15:55 GMT
I guess I misread your message.  The archive directory shall contain 
only jar files, not tar.gz files...

On 11/14/20 10:11 AM, Artemis User wrote:
>
> Assuming you were using hadoop for your yarn cluster.  You can specify 
> the spark parameters spark.yarn.archive or spark.yarn.jars to contain 
> the jar directory or jar files so that hadoop can find them by 
> default. See Spark online doc for details 
> (http://spark.apache.org/docs/latest/running-on-yarn.html#adding-other-jars). 
> For instance:
>
> spark.yarn.archive              hdfs:///spark-3/jars
>
> Please note that you will have to use the hadoop copy command to copy 
> your jars to the HDFS before executing spark-submit (this part wasn't 
> clear for a lot of non-hadoop users).  You may also want to load ALL 
> spark jars to that directory in advance to speed up the launch 
> process. You may want to contact your Hadoop admin for help.
>
> -- ND
>
> On 11/14/20 7:25 AM, Pedro Cardoso wrote:
>> Hello,
>>
>> I am submitting a spark application on spark yarn using the cluster 
>> execution mode.
>> The application itself depends on a couple of jars. I can 
>> successfully submit and run the application using spark-submit --jars 
>> option as seen below:
>> |spark-submit \ --name Yarn-App \ --class <FQN.Class> \ 
>> --properties-file conf/yarn.properties \ --jars 
>> lib/<first.jar>,lib/<second.jar>,lib/<third.jar> \ <application.jar>

>> > log/yarn-app.txt 2>&1|
>>
>> With the yarn.properties being something like:
>> |# Spark submit config which used in conjunction with yarn cluster 
>> mode of execution to not block spark-submit command # for application 
>> completion. spark.yarn.submit.waitAppCompletion=false 
>> spark.submit.deployMode=cluster spark.master=yarn ## General Spark 
>> Application properties spark.driver.cores=2 spark.driver.memory=4G 
>> spark.executor.memory=5G spark.executor.cores=2 
>> spark.driver.extraJavaOptions=-Xms2G 
>> spark.driver.extraClassPath=<first.jar>:<second.jar>:<third.jar>

>> spark.executor.heartbeatInterval=30s 
>> spark.shuffle.service.enabled=true spark.dynamicAllocation.enabled: 
>> True spark.dynamicAllocation.minExecutors: 1 
>> spark.dynamicAllocation.maxExecutors: 100 
>> spark.dynamicAllocation.initialExecutors: 10 
>> spark.kryo.referenceTracking=false spark.kryoserializer.buffer.max=1G 
>> spark.ui.showConsoleProgress=true spark.yarn.am.cores=4 
>> spark.yarn.am.memory=10G spark.yarn.archive=<HDFS path to spark-only 
>> jars> spark.yarn.historyServer.address=<url to history server>|
>>
>> However, I would like to have everyting specified in the properties 
>> file to simplify the work of my team and not force them to specify 
>> the jars every time.
>> So my question is what is the spark.property that replaces the 
>> spark-submit *--jars* parameter such that I can specify everything in 
>> properties file?
>>
>> I've tried creating a tar.gz with the contents of the archive 
>> specified in /spark.yarn.archive + /the extra 3 jars that I need, 
>> upload that to HDFS and change the archive property but it did not work.
>> I got class not defined exceptions on classes that come from the 3 
>> extra jars.
>>
>> If it helps, the jars are only required for the driver not the 
>> executors. They will simply perform spark-only operations.
>>
>> Thank you and have good weekend.
>>
>> --
>>
>> *Pedro Cardoso*
>>
>> *Research Engineer*
>>
>> pedro.cardoso@feedzai.com <mailto:pedro.cardoso@feedzai.com>
>>
>>
>> Follow Feedzai on Facebook. <https://www.facebook.com/Feedzai/>Follow 
>> Feedzai on Twitter! <https://twitter.com/feedzai>Connect with Feedzai 
>> on LinkedIn! <https://www.linkedin.com/company/feedzai/>
>>
>> Feedzai best in class aite report 
>> <https://feedzai.com/press-releases/aite-group-names-feedzai-market-leader/>
>>
>> /
>> /
>> /
>> The content of this email is confidential and intended for the 
>> recipient specified in message only. It is strictly prohibited to 
>> share any part of this message with any third party, without a 
>> written consent of the sender. If you received this message by 
>> mistake, please reply to this message and follow with its deletion, 
>> so that we can ensure such a mistake does not occur in the future./
>>
>> /The content of this email is confidential and intended for the 
>> recipient specified in message only. It is strictly prohibited to 
>> share any part of this message with any third party, without a 
>> written consent of the sender. If you received this message by 
>> mistake, please reply to this message and follow with its deletion, 
>> so that we can ensure such a mistake does not occur in the future./ 

Mime
View raw message