spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine DUBOIS <>
Subject Building Spark + hadoop docker for openshift
Date Mon, 30 Mar 2020 12:03:13 GMT
I'm trying to build a spark+hadoop docker image compatible with Openshift. 
I've used oshinko Spark build script here

to build something with Hadoop jar in classpath to allow usage of S3 storage. 
However I'm now stuk on the spark script. 
For reasons unknown, this script kubernetes/dockerfiles/spark/ 
contains a reference to SPARK_JAVA_OPS which seems deprecated since 2.2

I'm using spark 2.4.5 and try to integrate hadoop 2.9.2, so far the image build but I fail
all the time at submit with an error in entrypoint script. 

Is any of you manage to use spark-submit on K8S and how and is this file is
relevant ? 
here's my spark-submit option: 
./spark-submit \ 
--master k8s:// \ 
--deploy-mode cluster \ 
--conf "spark.kubernetes.driverEnv.SPARK_DRIVER_CLASS=SparkPi" \ 
--conf "spark.kubernetes.driverEnv.SPARK_DRIVER_MEMORY=1024m" \ 
--conf "spark.kubernetes.driverEnv.SPARK_EXECUTOR_CORES=2" \ 
--conf "spark.kubernetes.driverEnv.SPARK_EXECUTOR_MEMORY=2048g" \ 
--name "test_$(date +'%m-%d-%y_%H:%m')" \ 
--conf "spark.kubernetes.container.image=private.repo/spark-docker:latest" \ 
--conf "spark.kubernetes.container.image.pullPolicy=Always" \ 
--conf "spark.kubernetes.container.image.pullSecrets=mysecret" \ 
--conf "spark.kubernetes.namespace=spark2" \ 
--conf "spark.executor.instances=4" \ 
--class SparkPi "local:///opt/jar/sparkpi_2.10-1.0.jar" 1000000000 

of course /opt/jar/sparkpi_2.10-1.0.jar is part of my docker build. 

Thank you in advance. 

Antoine DUBOIS 

View raw message