kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gene Robichaux <Gene.Robich...@match.com>
Subject RE: Backups
Date Tue, 20 Jan 2015 18:57:58 GMT
We are using a 7 day retention. I like the idea of using a console-consumer to dump the topics
nightly, but with a replication factor of 3 in each DC I am not sure it is even worth the
trouble. We would have to have a pretty tragic event to take out all 4 of 10 servers.....

Could I have a single node in each that I could shutdown and then take "somewhat consistent"
snapshot? My challenge will be the amount of data to move from one to another.

The more I think about it the more I am inclined to not worry about an actual backup. Kafka
is just moving data from A to B....we have backups at both A and B....not going to worry too
much about the middle.

Thanks all for the informative comments.

Gene Robichaux
Manager, Database Operations
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225

-----Original Message-----
From: Jayesh Thakrar [mailto:j_thakrar@yahoo.com.INVALID] 
Sent: Tuesday, January 20, 2015 12:16 PM
To: users@kafka.apache.org
Subject: Re: Backups

Another option is to copy data from each topic (of interest/concern) to a "flat file on a
periodic basis".E.g. say you had a queue that only contained "textual data".Periodically I
would run the bundled console-consumer to read data from the queue and dump to a file/directory
and then backup it up without any worry and then move the file/directory out once the backup
is complete. This can serve as a point-in-time incremental snapshot.
And if you are itching to exercise your DBA roots (full backup followed by incremental backups),
then you could use the "read from the beginning" option of the console-consumer every once
in a while. Although note that the "full" is constrained by the retention period of the data
(controlled at the queue/cluster level).
      From: Gwen Shapira <gshapira@cloudera.com>
 To: "users@kafka.apache.org" <users@kafka.apache.org>
 Sent: Tuesday, January 20, 2015 12:07 PM
 Subject: Re: Backups
Interesting question.
I think you'll need to sync it for the exact same time across the entire cluster, otherwise
you'll recover from an inconsistent state.
Not sure if this is feasible, or how Kafka handles starting from inconsistent state.

If I were the sysadmin, I'd go with the good old MySQL method:
MirrorMaker to a replica, once a day stop MirrorMaker, stop the replica and take a cold backup.
Move copy to tape / NAS / offsite storage.

I'll be curious to hear what the LinkedIn team is doing.


On Tue, Jan 20, 2015 at 8:48 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com> wrote:
> Could one use ZFS or BTRFS snapshot functionality for this?
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management 
> Solr & Elasticsearch Support * http://sematext.com/
> On Tue, Jan 20, 2015 at 1:39 AM, Gwen Shapira <gshapira@cloudera.com> wrote:
>> Hi,
>> As a former DBA, I hear you on backups :)
>> Technically, you could copy all log.dir files somewhere safe 
>> occasionally. I'm pretty sure we don't guarantee the consistency or 
>> safety of this copy. You could find yourself with a corrupt "backup"
>> by copying files that are either in the middle of getting written or 
>> are inconsistent in time with other files. Kafka doesn't have a good 
>> way to stop writing to files for long enough to allow copying them 
>> safely.
>> Unlike traditional backups, there's no transaction log that can be 
>> rolled to move a disk copy forward in time (or that can be used when 
>> data files are locked for backups). In Kafka, the files *are* the 
>> transaction log and you roll back in time by deciding which offsets 
>> to read.
>> DR is possible using MirrorMaker though, since the only thing better 
>> than replication is... more replication!
>> So you could create a non-corrupt file copy by stopping a MirrorMaker 
>> replica occasionally and copying all files somewhere safe.
>> If it helps you sleep better at night :) Typically having kafka nodes 
>> on multiple racks and a DR in another data center is considered 
>> pretty safe.
>> Gwen
>> On Wed, Jan 14, 2015 at 9:22 AM, Gene Robichaux 
>> <Gene.Robichaux@match.com> wrote:
>> > Does anyone have any thoughts on Kafka broker backups?
>> >
>> > All of our topics have a replication factor of 3. However I just 
>> > want to
>> know if anyone does anything about traditional backups. My background 
>> is Ops DBA, so I have a special place in my heart for backups.
>> >
>> >
>> > Gene Robichaux
>> > Manager, Database Operations
>> > Match.com
>> > 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>> >

View raw message