spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: cartesian on pyspark not paralleised
Date Sat, 06 Dec 2014 14:25:44 GMT
You could try increasing the level of parallelism
(spark.default.parallelism) while creating the sparkContext

Thanks
Best Regards

On Fri, Dec 5, 2014 at 6:37 PM, Antony Mayi <antonymayi@yahoo.com.invalid>
wrote:

> Hi,
>
> using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel -
> I can seen multiple python processes spawned on each nodemanager but from
> some reason when running cartesian there is only single python process
> running on each node. the task is indicating thousands of partitions so
> don't understand why it is not running with higher parallelism. the
> performance is obviously poor although other operation rocks.
>
> any idea how to improve this?
>
> thank you,
> Antony.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message