spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabor Somogyi <gabor.g.somo...@gmail.com>
Subject Re: Spark Structured Streaming Custom Sources confusion
Date Fri, 28 Jun 2019 11:06:09 GMT
Hi Lars,

Since Structured Streaming doesn't support receivers at all so that
source/sink can't be used.

Data source v2 is under development and because of that it's a moving
target so I suggest to implement it with v1 (unless special features are
required from v2).
Additionally since I've just adopted Kafka batch source/sink I can say it's
doable to merge from v1 to v2 when time comes.
(Please see https://github.com/apache/spark/pull/24738. Worth to mention
this is batch and not streaming but there is a similar PR)
Dropping v1 will not happen lightning fast in the near future though...

BR,
G


On Tue, Jun 25, 2019 at 10:02 PM Lars Francke <lars.francke@gmail.com>
wrote:

> Hi,
>
> I'm a bit confused about the current state and the future plans of custom
> data sources in Structured Streaming.
>
> So for DStreams we could write a Receiver as documented. Can this be used
> with Structured Streaming?
>
> Then we had the DataSource API with DefaultSource et. al. which was (in my
> opinion) never properly documented.
>
> With Spark 2.3 we got a new DataSourceV2 (which also was a marker
> interface), also not properly documented.
>
> Now with Spark 3 this seems to change again? (
> https://issues.apache.org/jira/browse/SPARK-25390), at least the
> DataSourceV2 interface is gone, still no documentation but still called v2
> somehow?
>
> Can anyone shed some light on the current state of data sources & sinks
> for batch & streaming in Spark 2.4 and 3.x?
>
> Thank you!
>
> Cheers,
> Lars
>

Mime
View raw message