kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andras Beni <andrasb...@cloudera.com>
Subject Re: difference between 2 options
Date Tue, 27 Feb 2018 14:41:04 GMT
1) We write out one recovery point per log directory, which practically
means topicpartition. So if your topic is called mytopic, then you will
have a file called

recovery-point-offset-checkpoint in topic-0/ , in topic-1/ , and in
topic-2/ .

2) Data deletion in kafka is not related to what was read by consumers.
Data is deleted when there is either to much of it (log.retention.bytes
property) or it is too old (log.retention.ms property). And consumers keep
track of what they have consumed using the __consumer_offsets topic (or
some custom logic they choose).
What we are talking about is DeleteRecordsRequest. It is sent by a command
line tool called kafka.admin.DeleteRecordsCommand. This does not actually
delete any data but notes that the data before a given offset should not be
served anymore. This, just like recovery checkpointing, works on a
per-partition basis.

Does this answer your questions?

Best regards,
Andras


On Mon, Feb 26, 2018 at 11:43 PM, adrien ruffie <adriennolarsen@hotmail.fr>
wrote:

> Hi Andras,
>
>
> thank for your response !
>
> For log.flush.offset.checkpoint.interval.ms we write out only one
> recovery point for all logs ?
>
> But if I have 3 partitions, and for each partition the offset is
> different, what's happen ? We save in
>
> text file 3 different offset ? Or just only one for the three partitions ?
>
>
> When you say "to avoid exposing data that have been deleted by
> DeleteRecordsRequest"
>
> It means the old consumed data ? For example I have 34700 offset, it's to
> avoid reexposing
>
> 34000~34699 records to consumer after crash ?
>
> ________________________________
> De : Andras Beni <andrasbeni@cloudera.com>
> Envoyé : mardi 27 février 2018 06:16:41
> À : users@kafka.apache.org
> Objet : Re: difference between 2 options
>
> Hi Adrien,
>
> Every log.flush.offset.checkpoint.interval.ms  we write out the current
> recovery point for all logs to a text file in the log directory to avoid
> recovering the whole log on startup.
>
> and every log.flush.start.offset.checkpoint.interval.ms we write out the
> current log start offset for all logs to a text file in the log directory
> to avoid exposing data that have been deleted by DeleteRecordsRequest
>
> HTH,
> Andras
>
>
> On Mon, Feb 26, 2018 at 1:51 PM, adrien ruffie <adriennolarsen@hotmail.fr>
> wrote:
>
> > Hello all,
> >
> >
> > I have read linked porperties documentation, but I don't really
> understand
> > the difference between:
> >
> > log.flush.offset.checkpoint.interval.ms
> >
> >
> > and
> >
> >
> > log.flush.start.offset.checkpoint.interval.ms
> >
> >
> > Do you have a usecase of each property's utilization, I can't figure out
> > what the difference ...
> >
> >
> > best regards,
> >
> >
> > Adrien
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message