spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <>
Subject Re: Kafka client - specify offsets?
Date Wed, 25 Jun 2014 05:52:37 GMT

apparently, the parameter "auto.offset.reset" has a different meaning
in Spark's Kafka implementation than what is described in the

The Kafka docs at <>
specify the effect of "auto.offset.reset" as:
> What to do when there is no initial offset in ZooKeeper or if an offset is out of range:
> * smallest : automatically reset the offset to the smallest offset
> * largest : automatically reset the offset to the largest offset
> * anything else: throw exception to the consumer

However, Spark's implementation seems to drop the part "when there is
no initial offset", as can be seen in
-- it will just wipe the stored offset from Zookeeper. I guess it's
actually a bug, because the parameter's effect is different than what
is documented, but then it's good for you (and me) because it allows
to specify "I want all that I can get" or "I want to start reading
right now", even if there is an offset stored in Zookeeper.


On Sun, Jun 15, 2014 at 11:27 PM, Tobias Pfeiffer <> wrote:
> Hi,
> there are apparently helpers to tell you the offsets
> <>,
> but I have no idea how to pass that to the Kafka stream consumer. I am
> interested in that as well.
> Tobias
> On Thu, Jun 12, 2014 at 5:53 AM, Michael Campbell
> <> wrote:
>> Is there a way in the Apache Spark Kafka Utils to specify an offset to start
>> reading?  Specifically, from the start of the queue, or failing that, a
>> specific point?

View raw message