spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <>
Subject Re: Issue with zip and partitions
Date Wed, 02 Apr 2014 06:43:14 GMT
>From API docs: "Zips this RDD with another one, returning key-value
pairs with the first element in each RDD, second element in each RDD,
etc. Assumes that the two RDDs have the *same number of partitions*
and the *same number of elements in each partition* (e.g. one was made
through a map on the other)."

Basically, one RDD should be a mapped RDD of the other, or both RDDs
are mapped RDDs of the same RDD.

Btw, your message says "Dell - Internal Use - Confidential"...


On Tue, Apr 1, 2014 at 7:27 PM,  <> wrote:
> Dell - Internal Use - Confidential
> I got an exception "can't zip RDDs with unusual numbers of Partitions" when
> I apply any action (reduce, collect) of dataset created by zipping two
> dataset of 10 million entries each.  The problem occurs independently of the
> number of partitions or when I let Spark creates those partitions.
> Interestingly enough, I do not have problem zipping datasets of 1 and 2.5
> million entries.....
> A similar problem was reported on this board with 0.8 but remember if the
> problem was fixed.
> Any idea? Any workaround?
> I appreciate.

View raw message