spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Dave <ankurd...@gmail.com>
Subject Re: configuration needed to run twitter(25GB) dataset
Date Fri, 01 Aug 2014 07:16:03 GMT
At 2014-07-31 21:40:39 -0700, shijiaxin <shijiaxin.cn@gmail.com> wrote:
> Is it possible to reduce the number of edge partitions and exploit
> parallelism fully at the same time?
> For example, one partition per node, and the threads in the same node share
> the same partition.

It's theoretically possible to parallelize operations within a partition, but I wouldn't worry
about exploiting all available parallelism. PageRank is typically communication-bound rather
than computation-bound, so it can be a net gain to reduce the amount of communication by using
fewer partitions even if that means sacrificing some parallelism.

Ankur

Mime
View raw message