spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashic Mahtab <>
Subject RE: Are these numbers abnormal for spark streaming?
Date Thu, 22 Jan 2015 11:11:38 GMT
Hate to do this...but...erm...bump? Would really appreciate input from others using Streaming.
Or at least some docs that would tell me if these are expected or not.

Subject: Are these numbers abnormal for spark streaming?
Date: Wed, 21 Jan 2015 11:26:31 +0000

Hi Guys,
I've got Spark Streaming set up for a low data rate system (using spark's features for analysis,
rather than high throughput). Messages are coming in throughout the day, at around 1-20 per
second (finger in the air estimate...not analysed yet).  In the spark streaming UI for the
application, I'm getting the following after 17 hours.

StreamingStarted at: Tue Jan 20 16:58:43 GMT 2015Time since start: 18 hours 24 minutes 34
secondsNetwork receivers: 2Batch interval: 2 secondsProcessed batches: 16482Waiting batches:

Statistics over last 100 processed batchesReceiver StatisticsReceiverStatusLocationRecords
in last batch[2015/01/21 11:23:18]Minimum rate[records/sec]Median rate[records/sec]Maximum
rate[records/sec]Last ErrorRmqReceiver-0ACTIVEFOOOO
124726-Batch Processing StatisticsMetricLast batchMinimum25th percentileMedian75th percentileMaximumProcessing
Time3 seconds 994 ms157 ms4 seconds 16 ms4 seconds 961 ms5 seconds 3 ms5 seconds 171 msScheduling
Delay9 hours 15 minutes 4 seconds9 hours 10 minutes 54 seconds9 hours 11 minutes 56 seconds9
hours 12 minutes 57 seconds9 hours 14 minutes 5 seconds9 hours 15 minutes 4 secondsTotal Delay9
hours 15 minutes 8 seconds9 hours 10 minutes 58 seconds9 hours 12 minutes9 hours 13 minutes
2 seconds9 hours 14 minutes 10 seconds9 hours 15 minutes 8 seconds
Are these "normal". I was wondering what the scheduling delay and total delay terms are, and
if it's normal for them to be 9 hours.

I've got a standalone spark master and 4 spark nodes. The streaming app has been given 4 cores,
and it's using 1 core per worker node. The streaming app is submitted from a 5th machine,
and that machine has nothing but the driver running. The worker nodes are running alongside
Cassandra (and reading and writing to it).

Any insights would be appreciated.

View raw message