spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: combineByKey at ShuffledDStream.scala
Date Wed, 23 Jul 2014 05:05:43 GMT
Can you give an idea of the streaming program? Rest of the transformation
you are doing on the input streams?


On Tue, Jul 22, 2014 at 11:05 AM, Bill Jay <bill.jaypeterson@gmail.com>
wrote:

> Hi all,
>
> I am currently running a Spark Streaming program, which consumes data from
> Kakfa and does the group by operation on the data. I try to optimize the
> running time of the program because it looks slow to me. It seems the stage
> named:
>
> * combineByKey at ShuffledDStream.scala:42 *
>
> always takes the longest running time. And If I open this stage, I only
> see two executors on this stage. Does anyone has an idea what this stage
> does and how to increase the speed for this stage? Thanks!
>
> Bill
>

Mime
View raw message