Did some tests. The concern is SSS job running under YARN

Scenario 1)  use spark-sql-kafka-0-10_2.12-3.1.0.jar

  • Removed spark-sql-kafka-0-10_2.12-3.1.0.jar from anywhere on CLASSPATH including $SPARK_HOME/jars
  • Added the said jar file to spark-submit in client mode (the only mode available to PySpark) with --jars
  • spark-submit --master yarn --deploy-mode client --conf spark.pyspark.virtualenv.enabled=true .. bla bla..  --driver-memory 4G --executor-memory 4G --num-executors 2 --executor-cores 2 --jars $HOME/jars/spark-sql-kafka-0-10_2.12-3.1.0.jar xyz.py

This works fine

Scenario 2) use spark-sql-kafka-0-10_2.12-3.1.1.jar in spark-submit

  •  spark-submit --master yarn --deploy-mode client --conf spark.pyspark.virtualenv.enabled=true ..bla bla.. --driver-memory 4G --executor-memory 4G --num-executors 2 --executor-cores 2 --jars $HOME/jars/spark-sql-kafka-0-10_2.12-3.1.1.jar xyz.py

it failed with 

  • Caused by: java.lang.NoSuchMethodError: org.apache.spark.kafka010.KafkaTokenUtil$.needTokenUpdate(Ljava/util/Map;Lscala/Option;)Z

  • spark-submit --master yarn --deploy-mode client --conf spark.pyspark.virtualenv.enabled=true ..bla bla.. --driver-memory 4G --executor-memory 4G --num-executors 2 --executor-cores 2 --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1 xyz.py

it failed with

  • Caused by: java.lang.NoSuchMethodError: org.apache.spark.kafka010.KafkaTokenUtil$.needTokenUpdate(Ljava/util/Map;Lscala/Option;)Z


   view my Linkedin profile


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.


On Wed, 7 Apr 2021 at 13:20, Gabor Somogyi <gabor.g.somogyi@gmail.com> wrote:
+1 on Sean's opinion

On Wed, Apr 7, 2021 at 2:17 PM Sean Owen <srowen@gmail.com> wrote:
You shouldn't be modifying your cluster install. You may at this point have conflicting, excess JARs in there somewhere. I'd start it over if you can.

On Wed, Apr 7, 2021 at 7:15 AM Gabor Somogyi <gabor.g.somogyi@gmail.com> wrote:
Not sure what you mean not working. You've added 3.1.1 to packages which uses:

I think it worth an end-to-end dep-tree analysis what is really happening on the cluster...


On Wed, Apr 7, 2021 at 11:11 AM Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
Hi Gabor et. al.,

To be honest I am not convinced this package --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1 is really working!

I know for definite that spark-sql-kafka-0-10_2.12-3.1.0.jar works fine. I reported the package working before because under $SPARK_HOME/jars on all nodes there was a copy 3.0.1 jar file. Also in $SPARK_HOME/conf we had the following entries:

spark.driver.extraClassPath        $SPARK_HOME/jars/*.jar
spark.executor.extraClassPath      $SPARK_HOME/jars/*.jar

So the jar file was picked up first anyway.

The concern I have is that that the package uses older version of jar files, namely: the following in .ivy2/jars

-rw-r--r-- 1 hduser hadoop 6407352 Dec 19 13:14 com.github.luben_zstd-jni-1.4.8-1.jar
-rw-r--r-- 1 hduser hadoop  129174 Apr  6  2019 org.apache.commons_commons-pool2-2.6.2.jar
-rw-r--r-- 1 hduser hadoop 3754508 Jul 28  2020 org.apache.kafka_kafka-clients-2.6.0.jar
-rw-r--r-- 1 hduser hadoop  387494 Feb 22 03:57 org.apache.spark_spark-sql-kafka-0-10_2.12-3.1.1.jar
-rw-r--r-- 1 hduser hadoop   55766 Feb 22 03:58 org.apache.spark_spark-token-provider-kafka-0-10_2.12-3.1.1.jar
-rw-r--r-- 1 hduser hadoop  649950 Jan 18  2020 org.lz4_lz4-java-1.7.1.jar
-rw-r--r-- 1 hduser hadoop   41472 Dec 16  2019 org.slf4j_slf4j-api-1.7.30.jar
-rw-r--r-- 1 hduser hadoop    2777 Oct 22  2014 org.spark-project.spark_unused-1.0.0.jar
-rw-r--r-- 1 hduser hadoop 1969177 Nov 28 18:10 org.xerial.snappy_snappy-java-

So I am not sure. Hence I want someone to verify this independently in anger