spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anahita Talebi <anahita.t.am...@gmail.com>
Subject Running a spark code on multiple machines using google cloud platform
Date Thu, 02 Feb 2017 12:29:58 GMT
Dear all,

I am trying to run a spark code on multiple machines using submit job in
google cloud platform.
As the inputs of my code, I have a training and testing datasets.

When I use small training data set like (10kb), the code can be
successfully ran on the google cloud while when I have a large data set
like 50Gb, I received the following error:

17/02/01 19:08:06 ERROR org.apache.spark.scheduler.LiveListenerBus:
SparkListenerBus has already stopped! Dropping event
SparkListenerTaskEnd(2,0,ResultTask,TaskKilled,org.apache.spark.scheduler.TaskInfo@3101f3b3,null)

Does anyone can give me a hint how I can solve my problem?

PS: I cannot use small training data set because I have an
optimization code which needs to use all the data.

I have to use google could platform because I need to run the code on
multiple machines.

Thanks a lot,

Anahita

Mime
View raw message