spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anahita Talebi <anahita.t.am...@gmail.com>
Subject Re: (send this email to subscribe)
Date Wed, 04 Jan 2017 21:33:10 GMT
Hello,
Thanks for your answer.

I actually follow this link
https://cloud.google.com/dataproc/docs/tutorials/spark-scala#using_sbt

which is the same as what you gave but with different example. But the
problem is that it is not working.


I did the following steps:

1) create a project
2) enable billing and Dataproc API
3) Create a cluster with the name "cluster-1". it consist of 2 workers.
     - cluster-1-m (which is the master)
     - cluster-1-w-0
     - cluster-1-w-1
4) I connect to the cluster-1-m with ssh (as you can see in the image)
5) install sbt
6) create a HelloWorld.scala as an example
7) run the code using sbt
8) submit the job

But I think my problem is mostly in the last par "submit job".
For the main class or jar, I tried the following options:

1) HelloWorld
2) gs://dataproc-d7b4de0b-558a-4682-8d5f-59003ead6173-e//HelloWord.jar

dataproc-d7b4de0b-558a-4682-8d5f-59003ead6173-eu
<https://console.cloud.google.com/storage/browser/dataproc-d7b4de0b-558a-4682-8d5f-59003ead6173-eu/?_ga=1.246017220.1125553730.1483452107&project=test-cloud-154620>this
is actually the name of bucket which is created when I ran the sbt I think.

When I try these two options, the execution is failed and I don't know how
to solve it.

when I choose only HelloWorld I get the following error:

java.lang.ClassNotFoundException: HelloWorld
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:228)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:693)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

and when I choose
gs://dataproc-d7b4de0b-558a-4682-8d5f-59003ead6173-e//HelloWord.jar, I get:

=========== Cloud Dataproc Agent Error ===========
java.io.FileNotFoundException: File not found :
gs://dataproc-d7b4de0b-558a-4682-8d5f-59003ead6173-eu/HelloWorld.jar
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1293)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2034)
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2003)
	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.copyToLocalFile(GoogleHadoopFileSystemBase.java:1872)
	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1979)
	at com.google.cloud.hadoop.services.agent.util.HadoopUtil.download(HadoopUtil.java:71)
	at com.google.cloud.hadoop.services.agent.job.AbstractJobHandler.downloadResources(AbstractJobHandler.java:418)
	at com.google.cloud.hadoop.services.agent.job.AbstractJobHandler$StartDriver.call(AbstractJobHandler.java:528)
	at com.google.cloud.hadoop.services.agent.job.AbstractJobHandler$StartDriver.call(AbstractJobHandler.java:511)
	at com.google.cloud.hadoop.services.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
	at com.google.cloud.hadoop.services.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
	at com.google.cloud.hadoop.services.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
======== End of Cloud Dataproc Agent Error ========




[image: Inline image 1]
[image: Inline image 2]


Thanks,
Anahita


On Wed, Jan 4, 2017 at 7:08 PM, Dinko Srko─Ź <dinko.srkoc@gmail.com> wrote:

> You can run Spark app on Dataproc, which is Google's managed Spark and
> Hadoop service:
>
> https://cloud.google.com/dataproc/docs/
>
> basically, you:
>
> * assemble a jar
> * create a cluster
> * submit a job to that cluster (with the jar)
> * delete a cluster when the job is done
>
> Before all that, one has to create a Cloud Platform project, enable
> billing and Dataproc API - but all this is explained in the docs.
>
> Cheers,
> Dinko
>
>
> On 4 January 2017 at 17:34, Anahita Talebi <anahita.t.amiri@gmail.com>
> wrote:
> >
> > To whom it might concern,
> >
> > I have a question about running a spark code on Google cloud.
> >
> > Actually, I have a spark code and would like to run it using multiple
> > machines on Google cloud. Unfortunately, I couldn't find a good
> > documentation about how to do it.
> >
> > Do you have any hints which could help me to solve my problem?
> >
> > Have a nice day,
> >
> > Anahita
> >
> >
>

Mime
View raw message