spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diego <>
Subject Windowed Operations
Date Thu, 25 Sep 2014 11:45:46 GMT
Hi everyone, 

I'm trying to understand the windowed operations functioning. What I want to
achieve is the following: 

    val ssc = new StreamingContext(sc, Seconds(1)) 

    val lines = ssc.socketTextStream("localhost", 9999) 

    val window5 = lines.window(Seconds(5),Seconds(5)).reduce((time1: Long, 
time2:Long) => time1 + time2) //basically the sum of all the numbers in a
sliding window of 5 seconds each 5 seconds 

    val window10 = window5.window(Seconds(10),Seconds(10)).reduce((time1:
Long,  time2:Long) => time1 + time2) //here I'm wanting to reuse the already
calculated RDDs from window5. So I'm expecting 2 RDDs from the window5. 

    val window20 = window10.window(Seconds(20),Seconds(20)).reduce((time1:
Long,  time2:Long) => time1 + time2) //the same as in window10 but in this
case I expect two RDDs from window10. 

The thing is that sometimes window10 does not receive any RDD. Looking at
the logs and the code it seems that the time in the interval gets invalid.
In the code I see that WindowedDStream computes the interval doing the

  val currentWindow = new Interval(validTime - windowDuration +
parent.slideDuration, validTime) 
    val rddsInWindow = parent.slice(currentWindow) 

In the case of window10, the resulting slice is from  validTime - 10 seconds
+ 5 seconds to validTime. I don't understand why the parent slideDuration is
taking into account in this calculation. Could you please help me understand
this logic?. 
Do you see something wrong in the code? Is there another way to achieve the
same thing? 



View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message