Did some tests. The concern is SSS job running under YARN
Scenario 1) use spark-sql-kafka-0-10_2.12-3.1.0.jar
This works fine
Scenario 2) use spark-sql-kafka-0-10_2.12-3.1.1.jar in spark-submit
it failed with
it failed with
Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
+1 on Sean's opinionOn Wed, Apr 7, 2021 at 2:17 PM Sean Owen <email@example.com> wrote:You shouldn't be modifying your cluster install. You may at this point have conflicting, excess JARs in there somewhere. I'd start it over if you can.On Wed, Apr 7, 2021 at 7:15 AM Gabor Somogyi <firstname.lastname@example.org> wrote:Not sure what you mean not working. You've added 3.1.1 to packages which uses:* 2.6.0 kafka-clients: https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/pom.xml#L136* 2.6.2 commons pool: https://github.com/apache/spark/blob/1d550c4e90275ab418b9161925049239227f3dc9/pom.xml#L183I think it worth an end-to-end dep-tree analysis what is really happening on the cluster...GOn Wed, Apr 7, 2021 at 11:11 AM Mich Talebzadeh <email@example.com> wrote:Hi Gabor et. al.,To be honest I am not convinced this package --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.1 is really working!I know for definite that spark-sql-kafka-0-10_2.12-3.1.0.jar works fine. I reported the package working before because under $SPARK_HOME/jars on all nodes there was a copy 3.0.1 jar file. Also in $SPARK_HOME/conf we had the following entries:spark.yarn.archive=hdfs://rhes75:9000/jars/spark-libs.jarspark.driver.extraClassPath $SPARK_HOME/jars/*.jarspark.executor.extraClassPath $SPARK_HOME/jars/*.jarSo the jar file was picked up first anyway.The concern I have is that that the package uses older version of jar files, namely: the following in .ivy2/jars-rw-r--r-- 1 hduser hadoop 6407352 Dec 19 13:14 com.github.luben_zstd-jni-1.4.8-1.jar-rw-r--r-- 1 hduser hadoop 129174 Apr 6 2019 org.apache.commons_commons-pool2-2.6.2.jar-rw-r--r-- 1 hduser hadoop 3754508 Jul 28 2020 org.apache.kafka_kafka-clients-2.6.0.jar-rw-r--r-- 1 hduser hadoop 387494 Feb 22 03:57 org.apache.spark_spark-sql-kafka-0-10_2.12-3.1.1.jar-rw-r--r-- 1 hduser hadoop 55766 Feb 22 03:58 org.apache.spark_spark-token-provider-kafka-0-10_2.12-3.1.1.jar-rw-r--r-- 1 hduser hadoop 649950 Jan 18 2020 org.lz4_lz4-java-1.7.1.jar-rw-r--r-- 1 hduser hadoop 41472 Dec 16 2019 org.slf4j_slf4j-api-1.7.30.jar-rw-r--r-- 1 hduser hadoop 2777 Oct 22 2014 org.spark-project.spark_unused-1.0.0.jar-rw-r--r-- 1 hduser hadoop 1969177 Nov 28 18:10 org.xerial.snappy_snappy-java-220.127.116.11.jarSo I am not sure. Hence I want someone to verify this independently in anger