spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Coy <s...@infomedia.com.au.INVALID>
Subject Re: java.lang.ClassNotFoundException for s3a comitter
Date Fri, 19 Jun 2020 02:07:25 GMT
Hi Murat Migdisoglu,

Unfortunately you need the secret sauce to resolve this.

It is necessary to check out the Apache Spark source code and build it with the right command
line options. This is what I have been using:

dev/make-distribution.sh --name my-spark --tgz -Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud
-Dhadoop.version=3.2.1

This will add additional jars into the build.

Copy hadoop-aws-3.2.1.jar, hadoop-openstack-3.2.1.jar and spark-hadoop-cloud_2.12-3.0.0.jar
into the “jars” directory of your Spark distribution. If you are paranoid you could copy/replace
all the hadoop-*-3.2.1.jar files but I have not found that necessary.

You will also need to upgrade the version of guava that appears in the spark distro because
Hadoop 3.2.1 bumped this from guava-14.0.1.jar to guava-27.0-jre.jar. Otherwise you will get
runtime ClassNotFound exceptions.

I have been using this combo for many months now with the Spark 3.0 pre-releases and it has
been working great.

Cheers,

Steve C


On 19 Jun 2020, at 10:24 am, murat migdisoglu <murat.migdisoglu@gmail.com<mailto:murat.migdisoglu@gmail.com>>
wrote:

Hi all
I've upgraded my test cluster to spark 3 and change my comitter to directory and I still get
this error.. The documentations are somehow obscure on that.
Do I need to add a third party jar to support new comitters?

java.lang.ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol


On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <murat.migdisoglu@gmail.com<mailto:murat.migdisoglu@gmail.com>>
wrote:
Hello all,
we have a hadoop cluster (using yarn) using  s3 as filesystem with s3guard is enabled.
We are using hadoop 3.2.1 with spark 2.4.5.

When I try to save a dataframe in parquet format, I get the following exception:
java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol

My relevant spark configurations are as following:
"hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
"fs.s3a.committer.name<https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F&data=02%7C01%7Cscoy%40infomedia.com.au%7C0725287744754aed9c5108d813e71e6e%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637281230668124994&sdata=n6l70htGxJ1q%2BcWH21RWIML7eGdE26UCdY8cDsufY6o%3D&reserved=0>":
"magic",
"fs.s3a.committer.magic.enabled": true,
"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",

While spark streaming fails with the exception above, apache beam succeeds writing parquet
files.
What might be the problem?

Thanks in advance


--
"Talkers aren’t good doers. Rest assured that we’re going there to use our hands, not
our tongues."
W. Shakespeare


--
"Talkers aren’t good doers. Rest assured that we’re going there to use our hands, not
our tongues."
W. Shakespeare

This email contains confidential information of and is the copyright of Infomedia. It must
not be forwarded, amended or disclosed without consent of the sender. If you received this
message by mistake, please advise the sender and delete all copies. Security of transmission
on the internet cannot be guaranteed, could be infected, intercepted, or corrupted and you
should ensure you have suitable antivirus protection in place. By sending us your or any third
party personal details, you consent to (or confirm you have obtained consent from such third
parties) to Infomedia’s privacy policy. http://www.infomedia.com.au/privacy-policy/
Mime
View raw message