spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Easy way to get offset metatada with Spark Streaming API
Date Mon, 11 Sep 2017 18:01:27 GMT
https://issues-test.apache.org/jira/browse/SPARK-18258

On Mon, Sep 11, 2017 at 7:15 AM, Dmitry Naumenko <dm.naumenko@gmail.com> wrote:
> Hi all,
>
> It started as a discussion in
> https://stackoverflow.com/questions/46153105/how-to-get-kafka-offsets-with-spark-structured-streaming-api.
>
> So the problem that there is no support in Public API to obtain the Kafka
> (or Kineses) offsets. For example, if you want to save offsets in external
> storage in Custom Sink, you should :
> 1) preserve topic, partition and offset across all transform operations of
> Dataset (based on hard-coded Kafka schema)
> 2) make a manual group by partition/offset with aggregate max offset
>
> Structured Streaming doc says "Every streaming source is assumed to have
> offsets", so why it's not a part of Public API? What do you think about
> supporting it?
>
> Dmitry

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message