spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Jay <bill.jaypeter...@gmail.com>
Subject combineByKey at ShuffledDStream.scala
Date Tue, 22 Jul 2014 18:05:55 GMT
Hi all,

I am currently running a Spark Streaming program, which consumes data from
Kakfa and does the group by operation on the data. I try to optimize the
running time of the program because it looks slow to me. It seems the stage
named:

* combineByKey at ShuffledDStream.scala:42 *

always takes the longest running time. And If I open this stage, I only see
two executors on this stage. Does anyone has an idea what this stage does
and how to increase the speed for this stage? Thanks!

Bill

Mime
View raw message