spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Long running Spark Streaming Job increasing executing time per batch
Date Fri, 20 Jun 2014 21:43:28 GMT
In the spark web ui, you should see the same pattern of stage repeating
over time, as the same sequence of stages get computed in every batch. From
that you would be able to get a sense of how much corresponding stages take
across different batches, and which stage is actually is taking more time,
after a while.






On Thu, Jun 19, 2014 at 3:43 PM, Skogberg, Fredrik <
Fredrik.Skogberg@paddypower.com> wrote:

>
> Hi TD,
>
> >Thats quite odd. Yes, with checkpoint the lineage does not increase. Can
> you tell which stage is the >processing of each batch is causing the
> increase in the processing time?
>
> I haven’t been able to determine exactly what stage that is causing the
> increase in processing time. Any pointers that I should be looking out for?
> I’ve just monitored the “Total delay” and “execution times” part of the
> driver log.
>
> >Also, what is the batch interval, and checkpoint interval?
>
> The batch interval was set to a somewhat conservative 10 seconds, and the
> checkpoint I guess is the the default derived from that since I use
> updateStateByKey (as I understand it, using that function implies that the
> stream will be check pointed)
>
> Regards,
> Fred
>
> ________________________________________________________________________
> Privileged, confidential and/or copyright information may be contained in
> this communication. This e-mail and any files transmitted with it are
> confidential and intended solely for the use of the individual or entity to
> whom they are addressed. If you are not the intended addressee, you may not
> copy, forward, disclose or otherwise use this e-mail or any part of it in
> any way whatsoever. To do so is prohibited and may be unlawful. If you have
> received this email in error
> please notify the sender immediately.
>
> Paddy Power PLC may monitor the content of e-mail sent and received for
> the purpose of ensuring compliance with its policies and procedures.
>
> Paddy Power plc, Power Tower, Blocks 1-3 Belfield Office Park, Beech Hill
> Road, Clonskeagh, Dublin 4.  Registered in Ireland: 16956
> ________________________________________________________________________
>

Mime
View raw message