spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: KafkaInputDStream mapping of partitions to tasks
Date Thu, 27 Mar 2014 19:22:19 GMT
If you call repartition() on the original stream you can set the level of
parallelism after it's ingested from Kafka. I'm not sure how it maps kafka
topic partitions to tasks for the ingest thought.


On Thu, Mar 27, 2014 at 11:09 AM, Scott Clasen <scott.clasen@gmail.com>wrote:

> I have a simple streaming job that creates a kafka input stream on a topic
> with 8 partitions, and does a forEachRDD
>
> The job and tasks are running on mesos, and there are two tasks running,
> but
> only 1 task doing anything.
>
> I also set spark.streaming.concurrentJobs=8  but still there is only 1 task
> doing work. I would have expected that each task took a subset of the
> partitions.
>
> Is there a way to make more than one task share the work here?  Are my
> expectations off here?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message