spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <>
Subject Re: Why does sortByKey() transformation trigger a job in spark-shell?
Date Mon, 02 Nov 2015 13:51:45 GMT

Answering my own question after...searching sortByKey in the mailing
list archives and later in JIRA.

It turns out it's a known issue and filed under "sortByKey() launches
a cluster job when it shouldn't".

It's labelled "starter" that should not be that hard to fix. Does this
still hold? I'd like to work on it if it's "simple" and doesn't get me
swamped. Thanks!


Jacek Laskowski | |
Follow me at
Upvote at

On Mon, Nov 2, 2015 at 2:34 PM, Jacek Laskowski <> wrote:
> Hi Sparkians,
> I use the latest Spark 1.6.0-SNAPSHOT in spark-shell with the default
> local[*] master.
> I created an RDD of pairs using the following snippet:
> val rdd = sc.parallelize(0 to 5).map(n => (n, util.Random.nextBoolean))
> It's all fine so far. The map transformation causes no computation.
> I thought all transformations are lazy and trigger no job until an
> action's called. It seems I was wrong with sortByKey()! When I called
> `rdd.sortByKey()`, it started a job: sortByKey at <console>:27 (!)
> Can anyone explain what makes for the different behaviour of sortByKey
> since it is a transformation and hence should be lazy? Is this a
> special transformation?
> Pozdrawiam,
> Jacek
> --
> Jacek Laskowski | |
> Follow me at
> Upvote at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message