spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "萝卜丝炒饭" <1427357...@qq.com>
Subject Re: How to use ManualClock with Spark streaming
Date Tue, 21 Mar 2017 05:58:56 GMT
hi  Hemalatha,

you can use the time windows, it looks likee

df.groupby(windows('timestamp', '20 seconds',  '10 seconds'))

---Original---
From: "Saisai Shao"<sai.sai.shao@gmail.com>
Date: 2017/3/1 09:39:58
To: "Hemalatha A"<hemalatha.amrutha@googlemail.com>;
Cc: "spark users"<user@spark.apache.org>;
Subject: Re: How to use ManualClock with Spark streaming


I don't think using ManualClock is a right way to fix your problem here in Spark Streaming.&#xA0;

ManualClock in Spark is mainly used for unit test, it should manually advance the time to
make the unit test work. The usage looks different compared to the scenario you mentioned.


Thanks
Jerry


On Tue, Feb 28, 2017 at 10:53 PM, Hemalatha A <hemalatha.amrutha@googlemail.com> wrote:

Hi,


I am running streaming application reading data from kafka and performing window operations
on it. I have a usecase where &#xA0;all incoming events have a fixed latency of 10s, which
means data belonging to minute 10:00:00 will arrive 10s late at 10:00:10.&#xA0;


I want to set the spark clock to "Manualclock" and set the time behind by 10s so that the
batch calculation triggers at 10:00:10, during which time all the events for the previous
minute has arrived.&#xA0;


But, I see that&#xA0;"spark.streaming.clock" is hardcoded to  "org.apache.spark.util.SystemClock"
in the code.


Is there a way to easily &#xA0;hack this property to use Manual clock.

-- 


Regards
Hemalatha
Mime
View raw message