spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: Spark Streaming - How to control the parallelism like storm
Date Tue, 22 Oct 2013 15:21:15 GMT
Not separately at the level of `flatMap` and `map`.  The number of
partitions in the RDD those operations are working on determines the
potential parallelism.  The number of worker cores available determines how
much of that potential can be actualized.


On Tue, Oct 22, 2013 at 7:24 AM, Ryan Chan <ryanchan404@gmail.com> wrote:

> In storm, you can control the number of thread with the setSpout/setBolt,
> and how to do the same with Spark Streaming?
>
> e.g.
>
> val lines = ssc.socketTextStream(args(1), args(2).toInt)
> val words = lines.flatMap(_.split(" "))
> val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
> wordCounts.print()
> ssc.start()
>
>
> Sound like I cannot tell Spark to tell how many thread to be used with
> `flatMap` and how many thread to be used with `map` etc, right?
>
>
>

Mime
View raw message