spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From murat migdisoglu <murat.migdiso...@gmail.com>
Subject java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtoco
Date Wed, 17 Jun 2020 22:35:46 GMT
Hello all,
we have a hadoop cluster (using yarn) using  s3 as filesystem with s3guard
is enabled.
We are using hadoop 3.2.1 with spark 2.4.5.

When I try to save a dataframe in parquet format, I get the following
exception:
java.lang.ClassNotFoundException:
com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol

My relevant spark configurations are as following:
"hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
"fs.s3a.committer.name": "magic",
"fs.s3a.committer.magic.enabled": true,
"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",

While spark streaming fails with the exception above, apache beam succeeds
writing parquet files.
What might be the problem?

Thanks in advance


-- 
"Talkers aren’t good doers. Rest assured that we’re going there to use our
hands, not our tongues."
W. Shakespeare

Mime
View raw message