spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eduardo D'Avila" <eduardo.dav...@corp.globo.com>
Subject [Structured Streaming] Trying to use Spark structured streaming
Date Mon, 11 Sep 2017 15:04:56 GMT
Hi,

I'm trying to use Spark 2.1.1 structured streaming to *count the number of
records* from Kafka *for each time window* with the code in this GitHub gist
<https://gist.github.com/erdavila/b6ab0c216e82ae77fa8192c48cb816e4>.

I expected that, *once each minute* (the slide duration), it would *output
a single record* (since the only aggregation key is the window) with
the *record
count for the last 5 minutes* (the window duration). However, it outputs
several records 2-3 times per minute, like in the sample output included in
the gist.

Changing the output mode to "append" seems to change the behavior, but
still far from what I expected.

What is wrong with my assumptions on the way it should work? Given the
code, how should the sample output be interpreted or used?

Thanks,

Eduardo

Mime
View raw message