spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabor Somogyi <gabor.g.somo...@gmail.com>
Subject Re: Spark Structured Streaming Custom Sources confusion
Date Fri, 28 Jun 2019 14:10:40 GMT
Hi Lars,

DSv2 already used in production.

Documentation, well since Spark evolving fast I would take a look at how
the built-in connectors implemented.

BR?
G


On Fri, Jun 28, 2019 at 3:52 PM Lars Francke <lars.francke@gmail.com> wrote:

> Gabor,
>
> thank you. That is immensely helpful. DataSource v1 it is then. Does that
> mean DSV2 is not really for production use yet?
>
> Any idea what the best documentation would be? I'd probably start by
> looking at existing code.
>
> Cheers,
> Lars
>
> On Fri, Jun 28, 2019 at 1:06 PM Gabor Somogyi <gabor.g.somogyi@gmail.com>
> wrote:
>
>> Hi Lars,
>>
>> Since Structured Streaming doesn't support receivers at all so that
>> source/sink can't be used.
>>
>> Data source v2 is under development and because of that it's a moving
>> target so I suggest to implement it with v1 (unless special features are
>> required from v2).
>> Additionally since I've just adopted Kafka batch source/sink I can say
>> it's doable to merge from v1 to v2 when time comes.
>> (Please see https://github.com/apache/spark/pull/24738. Worth to mention
>> this is batch and not streaming but there is a similar PR)
>> Dropping v1 will not happen lightning fast in the near future though...
>>
>> BR,
>> G
>>
>>
>> On Tue, Jun 25, 2019 at 10:02 PM Lars Francke <lars.francke@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm a bit confused about the current state and the future plans of
>>> custom data sources in Structured Streaming.
>>>
>>> So for DStreams we could write a Receiver as documented. Can this be
>>> used with Structured Streaming?
>>>
>>> Then we had the DataSource API with DefaultSource et. al. which was (in
>>> my opinion) never properly documented.
>>>
>>> With Spark 2.3 we got a new DataSourceV2 (which also was a marker
>>> interface), also not properly documented.
>>>
>>> Now with Spark 3 this seems to change again? (
>>> https://issues.apache.org/jira/browse/SPARK-25390), at least the
>>> DataSourceV2 interface is gone, still no documentation but still called v2
>>> somehow?
>>>
>>> Can anyone shed some light on the current state of data sources & sinks
>>> for batch & streaming in Spark 2.4 and 3.x?
>>>
>>> Thank you!
>>>
>>> Cheers,
>>> Lars
>>>
>>

Mime
View raw message