spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eduardo D'Avila" <>
Subject [Structured Streaming] Trying to use Spark structured streaming
Date Mon, 11 Sep 2017 15:04:56 GMT

I'm trying to use Spark 2.1.1 structured streaming to *count the number of
records* from Kafka *for each time window* with the code in this GitHub gist

I expected that, *once each minute* (the slide duration), it would *output
a single record* (since the only aggregation key is the window) with
the *record
count for the last 5 minutes* (the window duration). However, it outputs
several records 2-3 times per minute, like in the sample output included in
the gist.

Changing the output mode to "append" seems to change the behavior, but
still far from what I expected.

What is wrong with my assumptions on the way it should work? Given the
code, how should the sample output be interpreted or used?



View raw message