spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?
Date Wed, 02 Oct 2019 04:16:28 GMT
Looks like it's missing, or intended to force custom streaming source
implemented as DSv2.

I'm not sure Spark community wants to expand DSv1 API: I could propose the
change if we get some supports here.

To Spark community: given we bring major changes on DSv2, someone would
want to rely on DSv1 while transition from old DSv2 to new DSv2 happens and
new DSv2 gets stabilized. Would we like to provide necessary changes on
DSv1?

Thanks,
Jungtaek Lim (HeartSaVioR)

On Wed, Oct 2, 2019 at 4:27 AM Jacek Laskowski <jacek@japila.pl> wrote:

> Hi,
>
> I think I've got stuck and without your help I won't move any further.
> Please help.
>
> I'm with Spark 2.4.4 and am developing a streaming Source (DSv1,
> MicroBatch) and in getBatch phase when requested for a DataFrame, there is
> this assert [1] I can't seem to go past with any DataFrame I managed to
> create as it's not streaming.
>
>           assert(batch.isStreaming,
>             s"DataFrame returned by getBatch from $source did not have
> isStreaming=true\n" +
>               s"${batch.queryExecution.logical}")
>
> [1]
> https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L439-L441
>
> All I could find is private[sql],
> e.g. SQLContext.internalCreateDataFrame(..., isStreaming = true) [2] or [3]
>
> [2]
> https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L422-L428
> [3]
> https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L62-L81
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> The Internals of Spark SQL https://bit.ly/spark-sql-internals
> The Internals of Spark Structured Streaming
> https://bit.ly/spark-structured-streaming
> The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
> Follow me at https://twitter.com/jaceklaskowski
>
>

Mime
View raw message