spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabh...@gmail.com>
Subject Re: [Spark Structued Streaming]: Read kafka offset from a timestamp
Date Tue, 20 Nov 2018 00:37:35 GMT
It really depends on whether we use it only for starting query (instead of
restoring from checkpoint) or we would want to restore the previous batch
from specific time (with restoring state as well).

For former would make sense and I'll try to see whether I can address it.
For latter that doesn't look like possible because we normally want to go
back to at least couple of hours which hundreds of batches could have been
processed and no information for such batch is left. Looks like you're
talking about the latter, but if we agree the former is also enough helpful
I can take a look at it.

Btw, it might also help on batch query as well, actually sounds more
helpful on batch query.

-Jungtaek Lim (HeartSaVioR)

2018년 11월 18일 (일) 오전 9:53, puneetloya <puneetloya@gmail.com>님이 작성:

> I would like to request a feature for reading data from Kafka Source based
> on
> a timestamp. So that if the application needs to process data from a
> certain
> time, it should be able to do it. I do agree, that there is checkpoint
> which
> gives us a continuation of stream process but what if I want to rewind the
> checkpoints.
> According to Spark experts, its not advised to edit checkpoints and finding
> the right offsets to replay Spark is tricky but replaying from a certain
> timestamp is a lot easier, atleast with a decent monitoring system.( the
> time from where things started to fall apart like a buggy push or a bad
> setting change)
>
> The Kafka consumer APIs support this method OffsetForTimes which can easily
> give the right offsets,
>
>
> https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/consumer/Consumer.html#offsetsForTimes
>
> Similar to the StartingOffsets and EndingOffsets, it can support
> startTimestamp and endTimeStamp
>
> In a SAAS environment, when continuous data keeps flowing, these small
> tweaks can help us repair our systems. Spark Structured Streaming is
> already
> great but features like these will keep things under control in a live
> production processing environment.
>
> Cheers,
> Puneet
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message