spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine DUBOIS <antoine.dub...@cc.in2p3.fr>
Subject Building Spark + hadoop docker for openshift
Date Mon, 30 Mar 2020 12:03:13 GMT
Hello, 
I'm trying to build a spark+hadoop docker image compatible with Openshift. 
I've used oshinko Spark build script here https://github.com/radanalyticsio/openshift-spark

to build something with Hadoop jar in classpath to allow usage of S3 storage. 
However I'm now stuk on the spark entrypoint.sh script. 
For reasons unknown, this script kubernetes/dockerfiles/spark/entrypoint.sh 
contains a reference to SPARK_JAVA_OPS which seems deprecated since 2.2 https://issues.apache.org/jira/browse/SPARK-24577

I'm using spark 2.4.5 and try to integrate hadoop 2.9.2, so far the image build but I fail
all the time at submit with an error in entrypoint script. 

Is any of you manage to use spark-submit on K8S and how and is this entrypoint.sh file is
relevant ? 
here's my spark-submit option: 
./spark-submit \ 
--master k8s://https://wok.in2p3.fr \ 
--deploy-mode cluster \ 
--conf "spark.kubernetes.driverEnv.SPARK_DRIVER_CLASS=SparkPi" \ 
--conf "spark.kubernetes.driverEnv.SPARK_DRIVER_MEMORY=1024m" \ 
--conf "spark.kubernetes.driverEnv.SPARK_EXECUTOR_CORES=2" \ 
--conf "spark.kubernetes.driverEnv.SPARK_EXECUTOR_MEMORY=2048g" \ 
--name "test_$(date +'%m-%d-%y_%H:%m')" \ 
--conf "spark.kubernetes.container.image=private.repo/spark-docker:latest" \ 
--conf "spark.kubernetes.container.image.pullPolicy=Always" \ 
--conf "spark.kubernetes.container.image.pullSecrets=mysecret" \ 
--conf "spark.kubernetes.namespace=spark2" \ 
--conf "spark.executor.instances=4" \ 
--class SparkPi "local:///opt/jar/sparkpi_2.10-1.0.jar" 1000000000 

of course /opt/jar/sparkpi_2.10-1.0.jar is part of my docker build. 

Thank you in advance. 


Antoine DUBOIS 
CCIN2P3 

Mime
View raw message