spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From onpoq l <>
Subject How to achieve reasonable performance on Spark Streaming?
Date Mon, 09 Jun 2014 16:48:33 GMT
Dear All,

I recently installed Spark 1.0.0 on a 10-slave dedicate cluster. However,
the max input rate that the system can sustain with stable latency seems
very low. I use a simple word counting workload over tweets:


With 2s batch interval, the 10-slave cluster can only handle ~ 30,000
tweets/s (which translates to ~ 300,000 words/s). To give you a sense about
the speed of a slave machine,  a single machine can handle ~ 100,000
tweets/s on a stream processing program in plain java.

I've tuned the following parameters without seeing obvious improvement:
1. Batch interval: 1s, 2s, 5s, 10s
2. Parallelism: 1 x total num of cores, 2x, 3x
4. Run type: yarn-client, standalone cluster

* My first question is: what are the max input rates you have observed on
Spark Streaming? I know it depends on the workload and the hardware. But I
just want to get some sense of the reasonable numbers.

* My second question is: any suggestion on what I can tune to improve the
performance? I've found unexpected delays in "reduce" that I can't explain,
and they may be related to the poor performance. Details are shown below

============= DETAILS =============

Below is the CPU utilization plot with 2s batch interval and 40,000
tweets/s. The latency keeps increasing while the CPU, network, disk and
memory are all under utilized.

I tried to find out which stage is the bottleneck. It seems that the
"reduce" phase for each batch can usually finish in less than 0.5s, but
sometimes (70 out of 545 batches) takes 5s. Below is a snapshot of the wet
UI showing the time taken by "reduce" in some batches where the normal
cases are marked in green and the abnormal case is marked in red:

I further look into all the tasks of a slow "reduce" stage. As shown by the
below snapshot, a small portion of the tasks are stragglers:

Here is the log of some slow "reduce" tasks on an executor, where the start
and end of the tasks are marked in red. They started at 21:55:43, and
completed at 21:55:48. During the 5s, I can only see shuffling at the
beginning and activities of input blocks.


For comparison, here is the log of the normal "reduce" tasks on the same

It seems that these slow "reduce" stages keep latency increasing. However,
I can't explain what causes the straggler tasks in the "reduce" stages.
Anybody has any idea?


View raw message