spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?
Date Tue, 01 Oct 2019 18:31:55 GMT
Hi,

I think I've got stuck and without your help I won't move any further.
Please help.

I'm with Spark 2.4.4 and am developing a streaming Source (DSv1,
MicroBatch) and in getBatch phase when requested for a DataFrame, there is
this assert [1] I can't seem to go past with any DataFrame I managed to
create as it's not streaming.

          assert(batch.isStreaming,
            s"DataFrame returned by getBatch from $source did not have
isStreaming=true\n" +
              s"${batch.queryExecution.logical}")

[1]
https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L439-L441

All I could find is private[sql],
e.g. SQLContext.internalCreateDataFrame(..., isStreaming = true) [2] or [3]

[2]
https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L422-L428
[3]
https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L62-L81

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming
https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Follow me at https://twitter.com/jaceklaskowski

Mime
View raw message