spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudipta Banerjee <asudipta.baner...@gmail.com>
Subject Re: Are these numbers abnormal for spark streaming?
Date Thu, 22 Jan 2015 15:12:43 GMT
Hi Ashic Mahtab,

The Cassandra and the Zookeeper are they installed as a part of Yarn
architecture or are they installed in a separate layer with Apache Spark .

Thanks and Regards,
Sudipta

On Thu, Jan 22, 2015 at 8:13 PM, Ashic Mahtab <ashic@live.com> wrote:

> Hi Guys,
> So I changed the interval to 15 seconds. There's obviously a lot more
> messages per batch, but (I think) it looks a lot healthier. Can you see any
> major warning signs? I think that with 2 second intervals, the setup /
> teardown per partition was what was causing the delays.
>
> Streaming
>
>    - *Started at: *Thu Jan 22 13:23:12 GMT 2015
>    - *Time since start: *1 hour 17 minutes 16 seconds
>    - *Network receivers: *2
>    - *Batch interval: *15 seconds
>    - *Processed batches: *309
>    - *Waiting batches: *0
>
>
>
> Statistics over last 100 processed batchesReceiver Statistics
>
>    - Receiver
>
>
>    - Status
>
>
>    - Location
>
>
>    - Records in last batch
>    - [2015/01/22 14:40:29]
>
>
>    - Minimum rate
>    - [records/sec]
>
>
>    - Median rate
>    - [records/sec]
>
>
>    - Maximum rate
>    - [records/sec]
>
>
>    - Last Error
>
> RmqReceiver-0ACTIVEVDCAPP53.foo.local2.6 K29106295-RmqReceiver-1ACTIVE
> VDCAPP50.bar.local2.6 K29107291-
> Batch Processing Statistics
>
>    MetricLast batchMinimum25th percentileMedian75th percentileMaximumProcessing
>    Time4 seconds 812 ms4 seconds 698 ms4 seconds 738 ms4 seconds 761 ms4
>    seconds 788 ms5 seconds 802 msScheduling Delay2 ms0 ms3 ms3 ms4 ms9 msTotal
>    Delay4 seconds 814 ms4 seconds 701 ms4 seconds 739 ms4 seconds 764 ms4
>    seconds 792 ms5 seconds 809 ms
>
>
> Regards,
> Ashic.
> ------------------------------
> From: ashic@live.com
> To: gerard.maas@gmail.com
> CC: user@spark.apache.org
> Subject: RE: Are these numbers abnormal for spark streaming?
> Date: Thu, 22 Jan 2015 12:32:05 +0000
>
>
> Hi Gerard,
> Thanks for the response.
>
> The messages get desrialised from msgpack format, and one of the strings
> is desrialised to json. Certain fields are checked to decide if further
> processing is required. If so, it goes through a series of in mem filters
> to check if more processing is required. If so, only then does the "heavy"
> work start. That consists of a few db queries, and potential updates to the
> db + message on message queue. The majority of messages don't need
> processing. The messages needing processing at peak are about three every
> other second.
>
> One possible things that might be happening is the session initialisation
> and prepared statement initialisation for each partition. I can resort to
> some tricks, but I think I'll try increasing batch interval to 15 seconds.
> I'll report back with findings.
>
> Thanks,
> Ashic.
>
> ------------------------------
> From: gerard.maas@gmail.com
> Date: Thu, 22 Jan 2015 12:30:08 +0100
> Subject: Re: Are these numbers abnormal for spark streaming?
> To: tathagata.das1565@gmail.com
> CC: ashic@live.com; tdas@databricks.com; user@spark.apache.org
>
> and post the code (if possible).
> In a nutshell, your processing time > batch interval,  resulting in an
> ever-increasing delay that will end up in a crash.
> 3 secs to process 14 messages looks like a lot. Curious what the job logic
> is.
>
> -kr, Gerard.
>
> On Thu, Jan 22, 2015 at 12:15 PM, Tathagata Das <
> tathagata.das1565@gmail.com> wrote:
>
> This is not normal. Its a huge scheduling delay!! Can you tell me more
> about the application?
> - cluser setup, number of receivers, whats the computation, etc.
>
> On Thu, Jan 22, 2015 at 3:11 AM, Ashic Mahtab <ashic@live.com> wrote:
>
> Hate to do this...but...erm...bump? Would really appreciate input from
> others using Streaming. Or at least some docs that would tell me if these
> are expected or not.
>
> ------------------------------
> From: ashic@live.com
> To: user@spark.apache.org
> Subject: Are these numbers abnormal for spark streaming?
> Date: Wed, 21 Jan 2015 11:26:31 +0000
>
>
> Hi Guys,
> I've got Spark Streaming set up for a low data rate system (using spark's
> features for analysis, rather than high throughput). Messages are coming in
> throughout the day, at around 1-20 per second (finger in the air
> estimate...not analysed yet).  In the spark streaming UI for the
> application, I'm getting the following after 17 hours.
>
> Streaming
>
>    - *Started at: *Tue Jan 20 16:58:43 GMT 2015
>    - *Time since start: *18 hours 24 minutes 34 seconds
>    - *Network receivers: *2
>    - *Batch interval: *2 seconds
>    - *Processed batches: *16482
>    - *Waiting batches: *1
>
>
>
> Statistics over last 100 processed batchesReceiver Statistics
>
>    - Receiver
>
>
>    - Status
>
>
>    - Location
>
>
>    - Records in last batch
>    - [2015/01/21 11:23:18]
>
>
>    - Minimum rate
>    - [records/sec]
>
>
>    - Median rate
>    - [records/sec]
>
>
>    - Maximum rate
>    - [records/sec]
>
>
>    - Last Error
>
> RmqReceiver-0ACTIVEFOOOO
> 144727-RmqReceiver-1ACTIVEBAAAAR
> 124726-
> Batch Processing Statistics
>
>    MetricLast batchMinimum25th percentileMedian75th percentileMaximumProcessing
>    Time3 seconds 994 ms157 ms4 seconds 16 ms4 seconds 961 ms5 seconds 3 ms5
>    seconds 171 msScheduling Delay9 hours 15 minutes 4 seconds9 hours 10
>    minutes 54 seconds9 hours 11 minutes 56 seconds9 hours 12 minutes 57
>    seconds9 hours 14 minutes 5 seconds9 hours 15 minutes 4 secondsTotal
>    Delay9 hours 15 minutes 8 seconds9 hours 10 minutes 58 seconds9 hours
>    12 minutes9 hours 13 minutes 2 seconds9 hours 14 minutes 10 seconds9
>    hours 15 minutes 8 seconds
>
>
> Are these "normal". I was wondering what the scheduling delay and total
> delay terms are, and if it's normal for them to be 9 hours.
>
> I've got a standalone spark master and 4 spark nodes. The streaming app
> has been given 4 cores, and it's using 1 core per worker node. The
> streaming app is submitted from a 5th machine, and that machine has nothing
> but the driver running. The worker nodes are running alongside Cassandra
> (and reading and writing to it).
>
> Any insights would be appreciated.
>
> Regards,
> Ashic.
>
>
>
>


-- 
Sudipta Banerjee
Consultant, Business Analytics and Cloud Based Architecture
Call me +919019578099

Mime
View raw message