spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anahita Talebi <>
Subject Re: (send this email to subscribe)
Date Wed, 04 Jan 2017 21:33:10 GMT
Thanks for your answer.

I actually follow this link

which is the same as what you gave but with different example. But the
problem is that it is not working.

I did the following steps:

1) create a project
2) enable billing and Dataproc API
3) Create a cluster with the name "cluster-1". it consist of 2 workers.
     - cluster-1-m (which is the master)
     - cluster-1-w-0
     - cluster-1-w-1
4) I connect to the cluster-1-m with ssh (as you can see in the image)
5) install sbt
6) create a HelloWorld.scala as an example
7) run the code using sbt
8) submit the job

But I think my problem is mostly in the last par "submit job".
For the main class or jar, I tried the following options:

1) HelloWorld
2) gs://dataproc-d7b4de0b-558a-4682-8d5f-59003ead6173-e//HelloWord.jar

is actually the name of bucket which is created when I ran the sbt I think.

When I try these two options, the execution is failed and I don't know how
to solve it.

when I choose only HelloWorld I get the following error:

java.lang.ClassNotFoundException: HelloWorld
	at java.lang.ClassLoader.loadClass(
	at java.lang.ClassLoader.loadClass(
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(
	at org.apache.spark.util.Utils$.classForName(Utils.scala:228)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:693)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

and when I choose
gs://dataproc-d7b4de0b-558a-4682-8d5f-59003ead6173-e//HelloWord.jar, I get:

=========== Cloud Dataproc Agent Error =========== File not found :
	at org.apache.hadoop.fs.FileUtil.copy(
	at org.apache.hadoop.fs.FileUtil.copy(
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(
	at java.util.concurrent.Executors$
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(
	at java.util.concurrent.ScheduledThreadPoolExecutor$
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$
======== End of Cloud Dataproc Agent Error ========

[image: Inline image 1]
[image: Inline image 2]


On Wed, Jan 4, 2017 at 7:08 PM, Dinko Srko─Ź <> wrote:

> You can run Spark app on Dataproc, which is Google's managed Spark and
> Hadoop service:
> basically, you:
> * assemble a jar
> * create a cluster
> * submit a job to that cluster (with the jar)
> * delete a cluster when the job is done
> Before all that, one has to create a Cloud Platform project, enable
> billing and Dataproc API - but all this is explained in the docs.
> Cheers,
> Dinko
> On 4 January 2017 at 17:34, Anahita Talebi <>
> wrote:
> >
> > To whom it might concern,
> >
> > I have a question about running a spark code on Google cloud.
> >
> > Actually, I have a spark code and would like to run it using multiple
> > machines on Google cloud. Unfortunately, I couldn't find a good
> > documentation about how to do it.
> >
> > Do you have any hints which could help me to solve my problem?
> >
> > Have a nice day,
> >
> > Anahita
> >
> >

View raw message