spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KevinZwx <kevinzwx1...@gmail.com>
Subject [Structured Streaming]Data processing and output trigger should be decoupled
Date Wed, 30 Aug 2017 16:26:01 GMT
Hi,

I'm working with structured streaming, and I'm wondering whether there
should be some improvements about trigger.

Currently, when I specify a trigger, i.e. tigger(Trigger.ProcessingTime("10
minutes")), the engine will begin processing data at the time the trigger
begins, like 10:00:00, 10:10:00, 10:20:00,..., etc, if the engine takes 10s
to process this batch of data, then we will get the output result at
10:00:10...,  then the engine just waits without processing any data. When
the next trigger begins, the engine begins to process the data during the
interval, and if this time the engine takes 15s to process the batch, we
will get result at 10:10:15. This is the problem.

In my understanding, the trigger and data processing should be decoupled,
the engine should keep on processing data as fast as possible, but only
generate output results at each trigger, therefore we can get the result at
10:00:00, 10:10:00, 10:20:00, ... So I'm wondering if there is any solution
or plan to work on this?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message