Hi All

I'm trying to create docker images that can access azure services using abfs hadoop driver, which is only available in haddop 3.2.

So I downloaded spark without Hadoop and generated spark images using the docker-image-tool.sh  itself.

In a new image using the resulting image as FROM, I've added hadoop 3.2 binary distro and following https://spark.apache.org/docs/2.2.0/hadoop-provided.html I've set:

export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)

Then when launching the jobs in K8s, it turns out, that the driver uses internally spark-submit for the driver  but it seems that launches with java directly for the executor
Result is that drivers can run correctly, but executors fails due to missing sl4j class

Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)



If I'd add it manually to the class path, then another hadoop class would be missing. 

What is the right way to generate a docker image for spark 2.4 with a custom hadoop distribution?


Thanks and regards
JL