kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Rathbone <matt...@foursquare.com>
Subject Consuming from X days ago & issues consuming from the beginning of time
Date Thu, 20 Sep 2012 16:20:58 GMT
Hey guys,

I've come across this behavior with the hadoop-consumer, but it certainly
applies to any consumer.

We've had our brokers up and running for about 9 days, with a 7-day
retention policy. (3 brokers with 3 partitions each)
I've just deployed a new hadoop consumer and wanted to read from the
beginning of time (7-days ago).

Here's the behavior I'm seeing:
- I tell the consumer to start from 0
- It queries the partition, finds the minimum available is 2000000, so it
starts there
- It starts consuming from 2000000+
- It throws an exception ("kafka.common.OffsetOutOfRangeException") because
that message was deleted already

Through sheer luck, after a few task failures the job managed to beat this
race condition, but it begs the question:

- How would I tell a consumer to start querying from T-4days? That would
totally solve the issue. I don't really need a full 7 days, but I have no
way to resolve time -> offset
(this is useful if people are tailing the events too, so they can tail
events from 3 days ago grepping for something)

Any ideas? Anyone else experienced this?
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com | @rathboma <http://twitter.com/rathboma> |

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message