spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Heji Kim <hster.investiga...@gmail.com>
Subject persistence iops and throughput check? Re: Running a spark code on multiple machines using google cloud platform
Date Fri, 03 Feb 2017 00:50:27 GMT
Dear Anahita,

When we run performance tests for Spark/YARN clusters on GCP, we have to
make sure we are within iops and throughput limits.  Depending on disk type
(standard or SSD) and size of disk, you will only get so many max sustained
iops and throughput per sec. The GCP instance metrics graphs are not great
but enough to determine if you are over the limit.

https://cloud.google.com/compute/docs/disks/performance

Heji

On Thu, Feb 2, 2017 at 4:29 AM, Anahita Talebi <anahita.t.amiri@gmail.com>
wrote:

> Dear all,
>
> I am trying to run a spark code on multiple machines using submit job in
> google cloud platform.
> As the inputs of my code, I have a training and testing datasets.
>
> When I use small training data set like (10kb), the code can be
> successfully ran on the google cloud while when I have a large data set
> like 50Gb, I received the following error:
>
> 17/02/01 19:08:06 ERROR org.apache.spark.scheduler.LiveListenerBus: SparkListenerBus
has already stopped! Dropping event SparkListenerTaskEnd(2,0,ResultTask,TaskKilled,org.apache.spark.scheduler.TaskInfo@3101f3b3,null)
>
> Does anyone can give me a hint how I can solve my problem?
>
> PS: I cannot use small training data set because I have an optimization code which needs
to use all the data.
>
> I have to use google could platform because I need to run the code on multiple machines.
>
> Thanks a lot,
>
> Anahita
>
>

Mime
View raw message