spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject select count * doesnt seem to respect update mode in Kafka Structured Streaming?
Date Mon, 19 Mar 2018 20:35:53 GMT
Hi All,

I have 10 million records in my Kafka and I am just trying to
spark.sql(select count(*) from kafka_view). I am reading from kafka and
writing to kafka.

My writeStream is set to "update" mode and trigger interval of one second (
Trigger.ProcessingTime(1000)). I expect the counts to be printed every
second but looks like it would print after going through all 10M. why?

Also, it seems to take forever whereas Linux wc of 10M rows would take 30
seconds.

Thanks!

Mime
View raw message