spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Aggarwal <different.sac...@gmail.com>
Subject Re: submissionTime vs batchTime, DirectKafka
Date Thu, 10 Mar 2016 04:59:51 GMT
Hi cody,

let me try once again to explain with example.

In BatchInfo class of spark "scheduling delay" is defined as

*def schedulingDelay: Option[Long] = processingStartTime.map(_ -
submissionTime)*

I am dumping batchinfo object in my LatencyListener which extends
StreamingListener.

batchTime = 1457424695400 ms

submissionTime = 1457425630780 ms

difference = 935380 ms

can this be considered a lag in processing of events . what is possible
explaination for this lag?

On Thu, Mar 10, 2016 at 12:22 AM, Cody Koeninger <cody@koeninger.org> wrote:

> I'm really not sure what you're asking.
>
> On Wed, Mar 9, 2016 at 12:43 PM, Sachin Aggarwal
> <different.sachin@gmail.com> wrote:
> > where are we capturing this delay?
> > I am aware of scheduling delay which is defined as processing
> > time-submission time not the batch create time
> >
> > On Wed, Mar 9, 2016 at 10:46 PM, Cody Koeninger <cody@koeninger.org>
> wrote:
> >>
> >> Spark streaming by default will not start processing a batch until the
> >> current batch is finished.  So if your processing time is larger than
> >> your batch time, delays will build up.
> >>
> >> On Wed, Mar 9, 2016 at 11:09 AM, Sachin Aggarwal
> >> <different.sachin@gmail.com> wrote:
> >> > Hi All,
> >> >
> >> > we have batchTime and submissionTime.
> >> >
> >> > @param batchTime   Time of the batch
> >> >
> >> > @param submissionTime  Clock time of when jobs of this batch was
> >> > submitted
> >> > to the streaming scheduler queue
> >> >
> >> > 1) we are seeing difference between batchTime and submissionTime for
> >> > small
> >> > batches(300ms) even in minutes for direct kafka this we see, only when
> >> > the
> >> > processing time is more than the batch interval. how can we explain
> this
> >> > delay??
> >> >
> >> > 2) In one of case batch processing time is more then batch interval,
> >> > then
> >> > will spark fetch the next batch data from kafka parallelly processing
> >> > the
> >> > current batch or it will wait for current batch to finish first ?
> >> >
> >> > I would be thankful if you give me some pointers
> >> >
> >> > Thanks!
> >> > --
> >> >
> >> > Thanks & Regards
> >> >
> >> > Sachin Aggarwal
> >> > 7760502772
> >
> >
> >
> >
> > --
> >
> > Thanks & Regards
> >
> > Sachin Aggarwal
> > 7760502772
>



-- 

Thanks & Regards

Sachin Aggarwal
7760502772

Mime
View raw message