kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Cheng <jch...@tivo.com>
Subject Re: Checkpointing with custom metadata
Date Tue, 04 Aug 2015 01:46:05 GMT
Nice new email address, Gwen. :)

On Aug 3, 2015, at 3:17 PM, Gwen Shapira <gwen@confluent.io> wrote:

> You are correct. You can see that ZookeeperConsumerConnector is hardcoded
> with null metadata.
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala#L310
> More interesting, it looks like the Metadata is not exposed in the new
> KafkaConsumer either.
> Mind sharing what did you plan to use it for? this will help us figure out
> how to expose it :)

I'm juggling around a couple designs on what I'm trying to do, so it may turn out that I actually
don't need it.

My generic answer is, I have some application state related to the processing of a topic,
and I wanted to store it somewhere persistent that will survive across process death and disk
failure. So storing it with the offset seemed like a nice solution. I could alternatively
store it in a standalone kafka topic instead.

The more detailed answer: I'm doing a time-based merge sort of 3 topics (A, B, and C) and
outputtiing the results into a new output topic (let's call it "sorted-topic"). Except that
during the initial creation of "sorted-topic", and I want all of topic A to be output to sorted-topic
first, and then followed by an on-going merge sort of topics B, C, and any updates to A that
come along.

I was trying to handle the situation of what happens if I crash when initially copying topic
A into sorted-topic. And I was thinking that I could save some metadata in my topic A checkpoint
that says "still doing initial copy of A". So that way, when I start up next time, I would
know to continue copying topic A to the output. Once I have finished copying all of A to sorted-topic,
that I would store "finished doing initial copy of A" into my checkpoint, and that upon restart,
I would check that and know to start doing the merge sort of A B C.

I have a couple other designs that seem cleaner, tho, so I might not actually need it.


> Gwen
> On Mon, Aug 3, 2015 at 1:52 PM, James Cheng <jcheng@tivo.com> wrote:
>> According to
>> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest,
>> we can store custom metadata with our checkpoints. It looks like the high
>> level consumer does not support committing offsets with metadata, and that
>> in order to checkpoint with custom metadata, we have to issue the
>> OffsetCommitRequest ourselves. Is that correct?
>> Thanks,
>> -James

View raw message