spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cody Koeninger (JIRA)" <>
Subject [jira] [Commented] (SPARK-17812) More granular control of starting offsets (assign)
Date Thu, 13 Oct 2016 22:25:20 GMT


Cody Koeninger commented on SPARK-17812:

1. we dont have lists, we have strings.  regexes and valid topic names have overlaps (dot
is the obvious one).

2. Mapping directly to kafka method names means we don't have to come up with some other (weird
and possibly overlapping) name when they add more ways to subscribe, we just use theirs.

3. I think this is a mess with kafka semantics for the reasons both you and I have already
expressed.  At any rate, I think Michael already clearly punted the "starting X" case to a
different topic.

4. I  think it's more than sufficiently clear as suggested, no one is going to expect that
a specific offset they provided is going to be overruled by a general single default.   The
implementation is also crystal clear - seek to the position identified by startingTime, then
seek to any specific offsets for specific partitions

Yes, this is all bikeshedding, but it's bikeshedding that directly affects what people are
actually able to do with the api.  Needlessly restricting it for reasons that have nothing
to do with safety is just going to piss users off for no reason. Just because you don't have
a use case that needs it, doesn't mean you should arbitrarily prevent users from doing it.

Please, just choose something and let me build it so that people can actually use the thing
by the next release....

> More granular control of starting offsets (assign)
> --------------------------------------------------
>                 Key: SPARK-17812
>                 URL:
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Michael Armbrust
> Right now you can only run a Streaming Query starting from either the earliest or latests
offsets available at the moment the query is started.  Sometimes this is a lot of data.  It
would be nice to be able to do the following:
>  - seek to user specified offsets for manually specified topicpartitions

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message