kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <g...@confluent.io>
Subject Re: Checkpointing with custom metadata
Date Tue, 04 Aug 2015 02:01:15 GMT
If it was my architecture, I'd consider publishing a "control message"
either to same topic or separate saying "done with topic A".
The "thing" that needs to continue copying A also needs to know where to
continue, so presumably if the "done with A" message is missing, it will
read from sorted-topic and find out where it was at. So even if we
accidentally dropped the "done" message, it will waste a bit of time, but
won't corrupt anything.

This is one of the use-cases that transactional producers would be so cool

I can see how adding metadata to commit message can emulate some
light-weight transactions, but I'd be concerned that this capability can
get abused...

Thanks. I like my new address :)

On Mon, Aug 3, 2015 at 6:46 PM, James Cheng <jcheng@tivo.com> wrote:

> Nice new email address, Gwen. :)
> On Aug 3, 2015, at 3:17 PM, Gwen Shapira <gwen@confluent.io> wrote:
> > You are correct. You can see that ZookeeperConsumerConnector is hardcoded
> > with null metadata.
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala#L310
> >
> > More interesting, it looks like the Metadata is not exposed in the new
> > KafkaConsumer either.
> >
> > Mind sharing what did you plan to use it for? this will help us figure
> out
> > how to expose it :)
> >
> I'm juggling around a couple designs on what I'm trying to do, so it may
> turn out that I actually don't need it.
> My generic answer is, I have some application state related to the
> processing of a topic, and I wanted to store it somewhere persistent that
> will survive across process death and disk failure. So storing it with the
> offset seemed like a nice solution. I could alternatively store it in a
> standalone kafka topic instead.
> The more detailed answer: I'm doing a time-based merge sort of 3 topics
> (A, B, and C) and outputtiing the results into a new output topic (let's
> call it "sorted-topic"). Except that during the initial creation of
> "sorted-topic", and I want all of topic A to be output to sorted-topic
> first, and then followed by an on-going merge sort of topics B, C, and any
> updates to A that come along.
> I was trying to handle the situation of what happens if I crash when
> initially copying topic A into sorted-topic. And I was thinking that I
> could save some metadata in my topic A checkpoint that says "still doing
> initial copy of A". So that way, when I start up next time, I would know to
> continue copying topic A to the output. Once I have finished copying all of
> A to sorted-topic, that I would store "finished doing initial copy of A"
> into my checkpoint, and that upon restart, I would check that and know to
> start doing the merge sort of A B C.
> I have a couple other designs that seem cleaner, tho, so I might not
> actually need it.
> -James
> > Gwen
> >
> >
> > On Mon, Aug 3, 2015 at 1:52 PM, James Cheng <jcheng@tivo.com> wrote:
> >
> >> According to
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest
> ,
> >> we can store custom metadata with our checkpoints. It looks like the
> high
> >> level consumer does not support committing offsets with metadata, and
> that
> >> in order to checkpoint with custom metadata, we have to issue the
> >> OffsetCommitRequest ourselves. Is that correct?
> >>
> >> Thanks,
> >> -James
> >>
> >>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message