spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chang Chen <baibaic...@gmail.com>
Subject Re: The Future Of DStream
Date Wed, 27 Jul 2016 10:45:13 GMT
Things like kafka and user-defined sources are not supported yet, just
because Structure Streaming is in alpha stage.

Things like sort are not supported because of implementation difficulty,
and I don't think DStream can support either

What I want to know is the difference between API (or abstraction), for
example, It is quite easy to use same codes for processing batch data
because of unbounded table abstraction ( which comes from google's Dataflow
paper), that's why the internal engine is based on logical plan, spark plan
and RDD. In contrast, DStream can't do same thing easily

Actually, Dataset supports map,flatMap and reduce,  and hence I can do any
user-defined work in theory, that's why I ask what kind of low-level
control that DStream can do while Structure Stream can not.

Thanks
Chang





On Wed, Jul 27, 2016 at 6:03 PM, Ofir Manor <ofir.manor@equalum.io> wrote:

> For the 2.0 release, look for "Unsupported Operations" here:
>
> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
> Also, there are bigger gaps - like no Kafka support, no way to plug
> user-defined sources or sinks etc
>
> Ofir Manor
>
> Co-Founder & CTO | Equalum
>
> Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io
>
> On Wed, Jul 27, 2016 at 11:24 AM, Chang Chen <baibaichen@gmail.com> wrote:
>
>>
>> I don't understand what kind of low level control that DStream can do
>> while Structure Streaming can not
>>
>> Thanks
>> Chang
>>
>> On Wednesday, July 27, 2016, Matei Zaharia <matei.zaharia@gmail.com>
>> wrote:
>>
>>> Yup, they will definitely coexist. Structured Streaming is currently
>>> alpha and will probably be complete in the next few releases, but Spark
>>> Streaming will continue to exist, because it gives the user more low-level
>>> control. It's similar to DataFrames vs RDDs (RDDs are the lower-level API
>>> for when you want control, while DataFrames do more optimizations
>>> automatically by restricting the computation model).
>>>
>>> Matei
>>>
>>> On Jul 27, 2016, at 12:03 AM, Ofir Manor <ofir.manor@equalum.io> wrote:
>>>
>>> Structured Streaming in 2.0 is declared as alpha - plenty of bits still
>>> missing:
>>>
>>> http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
>>> I assume that it will be declared stable / GA in a future 2.x release,
>>> and then it will co-exist with DStream for quite a while before someone
>>> will suggest to start a deprecation process that will eventually lead to
>>> its removal...
>>> As a user, I guess we will need to apply judgement about when to switch
>>> to Structured Streaming - each of us have a different risk/value tradeoff,
>>> based on our specific situation...
>>>
>>> Ofir Manor
>>>
>>> Co-Founder & CTO | Equalum
>>>
>>> Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io
>>>
>>> On Wed, Jul 27, 2016 at 8:02 AM, Chang Chen <baibaichen@gmail.com>
>>> wrote:
>>>
>>>> Hi guys
>>>>
>>>> Structure Stream is coming with spark 2.0,  but I noticed that DStream
>>>> is still here
>>>>
>>>> What's the future of the DStream, will it be deprecated and removed
>>>> eventually? Or co-existed with  Structure Stream forever?
>>>>
>>>> Thanks
>>>> Chang
>>>>
>>>>
>>>
>>>
>

Mime
View raw message