spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Dave <ankurd...@gmail.com>
Subject Re: configuration needed to run twitter(25GB) dataset
Date Fri, 01 Aug 2014 10:45:44 GMT
At 2014-08-01 02:12:08 -0700, shijiaxin <shijiaxin.cn@gmail.com> wrote:
> When I use fewer partitions, (like 6)
> It seems that all the task will be assigned to the same machine, because the
> machine has more than 6 cores.But this will run out of memory.
> How to set fewer partitions number and use all the machine at the same time?

Yes, I've encountered this problem myself. I haven't tried this, but one idea is to reduce
the number of cores that Spark is allowed to use on each worker by passing --cores to spark-submit
or setting SPARK_WORKER_CORES.

Ankur

Mime
View raw message