spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Zhang <java8...@hotmail.com>
Subject java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT on EMR
Date Mon, 18 Mar 2019 15:46:41 GMT
Hi,

I know the JIRA of this error (https://issues.apache.org/jira/browse/SPARK-18112), and I read
all the comments and even PR for it.

But I am facing this issue on AWS EMR, and only in Oozie Spark Action. I am looking for someone
can give me a hint or direction,  so I can see if I can overcome this issue on EMR.

I am testing a simple Spark application on EMR-5.12.2, which comes with Hadoop 2.8.3 + HCatalog
2.3.2 + Spark 2.2.1, and using AWS Glue Data Catalog for both Hive + Spark table metadata.

First of all, both Hive and Spark work fine with AWS Glue as metadata catalog. And my spark
application works in spark-submit.

[hadoop@ip-172-31-65-232 oozieJobs]$ spark-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.1
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("show databases").show
+---------------+
|   databaseName|
+---------------+
|        default|
|googleanalytics|
|       sampledb|
+---------------+


I can access and query the database I created in Glue without any issue on spark-shell or
spark-sql.
And as part of later problem, I can see when it works in this case, there is no set of "spark.sql.hive.metastore.version"
in spark-shell, as the default value is shown below:

scala> spark.conf.get("spark.sql.hive.metastore.version")
res2: String = 1.2.1


Even though it shows version as "1.2.1", but I knew that by using Glue the hive metastore
version will be "2.3.2", I can see "hive-metastore-2.3.2-amzn-1.jar" in the Hive library path.

Now here comes the issue, when I test the Spark code in the Oozie Spark action, and "enableHiveSupport"
on the Spark session, it works with spark-submit in the command line, but failed with the
following error in the oozie runtime:

ailing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw
exception, HIVE_STATS_JDBC_TIMEOUT
java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
        at org.apache.spark.sql.hive.HiveUtils$.hiveClientConfigurations(HiveUtils.scala:200)
        at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:265)
        at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
        at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:195)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:195)


I know this most likely caused by the Oozie runtime classpath, but I spent days of trying
and still cannot find out a solution. We use Spark as our core of ETL engine, and the ability
to manage and query the HiveCatalog is critical for us.

Here are what puzzled me:

  *   I know this issue was supposed fixing in Spark 2.2.0, and on this ERM, we are using
Spark 2.2.1
  *   There is 1.2.1 version of hive metastore jar under the spark jars on EMR. Does this
mean in the successful spark-shell runtime, spark indeed is using 1.2.1 version of hive-metastore?

[hadoop@ip-172-31-65-232 oozieJobs]$ ls /usr/lib/spark/jars/*hive-meta*
/usr/lib/spark/jars/hive-metastore-1.2.1-spark2-amzn-0.jar

  *   There is 2.3.2 version of hive metastore jar under the Hive component on this EMR, which
I believe it pointing to the Glue, right?

[hadoop@ip-172-31-65-232 oozieJobs]$ ls /usr/lib/hive/lib/*hive-meta*
/usr/lib/hive/lib/hive-metastore-2.3.2-amzn-1.jar  /usr/lib/hive/lib/hive-metastore.jar

  *   I specified the "oozie.action.sharelib.for.spark=spark,hive" in the oozie, and I can
see oozie runtime loads the jars from both spark and hive share libs. There is NO hive-metastore-1.2.1-spark2-amzn-0.jar
in the oozie SPARK sharelib, and there is indeed hive-metastore-2.3.2-amzn-1.jar in the oozie
HIVE sharelib.
  *   Based on my understanding of (https://issues.apache.org/jira/browse/SPARK-18112), here
are what I did so far trying to fix this in oozie runtime, but none of them works
     *   I added hive-metastore-1.2.1-spark2-amzn-0.jar into hdfs of ozzie spark share lib,
and run "oozie admin -sharelibupdate".  After that, I confirm this library loaded in the oozie
runtime log of my spark action, but I got the same error message.
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2" in the <spark-opts>
of my oozie spark action, and confirm this configuration in spark session, but I still got
the same error message above.
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf spark.sql.hive.metastore.jars=maven",
but still got the same error message
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf spark.sql.hive.metastore.jars=/etc/spark/conf/hive-site.xml,/usr/lib/spark/jars/*"
in oozie spark action, but got the same error message
     *   I added "--conf spark.sql.hive.metastore.version=2.3.2 --conf hive.metastore.uris=thrift://ip-172-31-65-232.ec2.internal:9083
--conf spark.sql.hive.metastore.jars=/etc/spark/conf/hive-site.xml,/usr/lib/spark/jars/*"
in the oozie spark action, but got the same error.

I run out of options to try, and I really have no idea what is missing in the oozie runtime
causing this error in the Spark.

Let me know if you have any idea.

Thanks

Yong


Mime
View raw message