spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Mahadevan <ar...@apache.org>
Subject Re: Plans for Session Windows?
Date Wed, 29 Aug 2018 18:18:09 GMT
I guess you need to tweak your logic around watermark and merging states.
Posted in SO.

On Wed, 29 Aug 2018 at 10:39, Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
wrote:

> Somewhat related, still waiting to see if anyone answers this
> https://stackoverflow.com/questions/51810460/is-proper-event-time-sessionization-possible-with-spark-structured-streaming
>
> On Thu, 9 Aug 2018 at 16:30, Arun Mahadevan <arunm@apache.org> wrote:
>
>> Not sure if it changed recently, but the beam spark runner still uses the
>> DStreams API. It need not necessarily use the higher level apis (like
>> mapGroupsWithState or equivalent) provided by spark, it could as well be
>> based on groupByKey or reduce primitives and the window functions provided
>> by beam.
>>
>> On 9 August 2018 at 12:23, Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
>> wrote:
>>
>>> Thanks Arun. It's curious as Apache Beam says it fully supports
>>> <https://beam.apache.org/documentation/runners/capability-matrix/>
>>> session windows running on Spark but I'd imagine that, under the hood, it's
>>> leveraging mapGroupsWithState.
>>>
>>> Are there any plans to support the mapGroupsWithState API in PySpark?
>>>
>>> On Thu, 9 Aug 2018 at 13:13, Arun Mahadevan <arunm@apache.org> wrote:
>>>
>>>> There is no stock API to do this directly, but it can be implemented on
>>>> top of mapGroupWithState like here -
>>>>
>>>>
>>>>
>>>> https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredSessionization.scala
>>>>
>>>>
>>>>
>>>> It would be worth bundling this into a builtin api, but AFAIK there are
>>>> no plans yet.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Arun
>>>>
>>>>
>>>> On 9 August 2018 at 08:02, Mike Sukmanowsky <mike.sukmanowsky@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Just wondering if Spark has any plans to support session-based windows
>>>>> in structured streaming as documented by Apache Beam here
>>>>> <https://beam.apache.org/documentation/programming-guide/#provided-windowing-functions>
or
>>>>> perhaps better in this blog post
>>>>> <https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102>?
>>>>>
>>>>> [image:
>>>>> Figure-16---Session-Merging-a526d52186fcc33f3b5b5c59b176ac4e.jpg]
>>>>>
>>>>> --
>>>>> Mike Sukmanowsky
>>>>> Aspiring Digital Carpenter
>>>>>
>>>>> *e*: mike.sukmanowsky@gmail.com
>>>>>
>>>>> LinkedIn <http://www.linkedin.com/profile/view?id=10897143> | github
>>>>> <https://github.com/msukmanowsky>
>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Mike Sukmanowsky
>>> Aspiring Digital Carpenter
>>>
>>> *e*: mike.sukmanowsky@gmail.com
>>>
>>> LinkedIn <http://www.linkedin.com/profile/view?id=10897143> | github
>>> <https://github.com/msukmanowsky>
>>>
>>>
>>
>
> --
> Mike Sukmanowsky
> Aspiring Digital Carpenter
>
> *e*: mike.sukmanowsky@gmail.com
>
> LinkedIn <http://www.linkedin.com/profile/view?id=10897143> | github
> <https://github.com/msukmanowsky>
>
>

Mime
View raw message