spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Naumenko <dm.naume...@gmail.com>
Subject Re: Easy way to get offset metatada with Spark Streaming API
Date Tue, 12 Sep 2017 07:45:59 GMT
Thanks, Cody

Unfortunately, it seems to be there is no active development right now.
Maybe I can step in and help with it somehow?

Dmitry

2017-09-11 21:01 GMT+03:00 Cody Koeninger <cody@koeninger.org>:

> https://issues-test.apache.org/jira/browse/SPARK-18258
>
> On Mon, Sep 11, 2017 at 7:15 AM, Dmitry Naumenko <dm.naumenko@gmail.com>
> wrote:
> > Hi all,
> >
> > It started as a discussion in
> > https://stackoverflow.com/questions/46153105/how-to-get-
> kafka-offsets-with-spark-structured-streaming-api.
> >
> > So the problem that there is no support in Public API to obtain the Kafka
> > (or Kineses) offsets. For example, if you want to save offsets in
> external
> > storage in Custom Sink, you should :
> > 1) preserve topic, partition and offset across all transform operations
> of
> > Dataset (based on hard-coded Kafka schema)
> > 2) make a manual group by partition/offset with aggregate max offset
> >
> > Structured Streaming doc says "Every streaming source is assumed to have
> > offsets", so why it's not a part of Public API? What do you think about
> > supporting it?
> >
> > Dmitry
>

Mime
View raw message