kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Percy <mpe...@apache.org>
Subject Re: Backup and restore of Kudu Metadata/Data
Date Mon, 22 Aug 2016 23:14:38 GMT
I would recommend a snapshot scan for data backup. You can easily do that
with MapReduce.

Metadata backup is tough. One thing you could do is backup the master data
and wal directories. If your filesystem supports snapshots then taking a
snapshot of those directories should give you a consistent backup.
Otherwise you should shut down the master, copy the master data and wal
dirs, then bring the master back up.

For restoring a metadata backup, it's as simple as restoring the file
system data for the master. For restoring a data backup, you could first
drop the tables, recreate them, then run a MapReduce job that upserts all
the data from the snapshot scan.

All in all, backup and restore is something that is probably going to get
worked on very soon, so thanks for reminding us. We know we need to
document these procedures and make them easier and less rough around the

Although I know this has been discussed in the past, I couldn't find a JIRA
so I filed https://issues.apache.org/jira/browse/KUDU-1575 to track this


On Wed, Aug 17, 2016 at 7:05 PM, Mac Noland <mcdonaldnoland@gmail.com>

> From an Impala perspective, is making a scheduled copy of the table into
> HDFS an option for you?
> http://kudu.apache.org/faq.html
> How can I back up my Kudu data?
> <http://kudu.apache.org/faq.html#how-can-i-back-up-my-kudu-data>
> Kudu doesn’t yet have a built-in backup mechanism. Similar to bulk loading
> data, Impala can help if you have it available. You can use it to copy your
> data into Parquet format using a statement like:
> INSERT INTO TABLE some_parquet_table SELECT * FROM kudu_table
> then use distcp <http://hadoop.apache.org/docs/r1.2.1/distcp2.html> to
> copy the Parquet data to another cluster. While Kudu is in beta, we’re not
> expecting people to deploy mission-critical workloads on it yet.
> On Wed, Aug 17, 2016 at 7:07 AM, Amit Adhau <amit.adhau@globant.com>
> wrote:
>> Hi Kudu team,
>> Can you please suggest what would be the best way/policy to backup and
>> restore the Kudu metadata/data on kudu side as well as on Impala side and
>> also, if that can be automated.
>> --
>> Thanks & Regards,
>> *Amit Adhau* | Data Architect
>> *GLOBANT* | IND:+91 9821518132
>> [image: Facebook] <https://www.facebook.com/Globant>
>> [image: Twitter] <http://www.twitter.com/globant>
>> [image: Youtube] <http://www.youtube.com/Globant>
>> [image: Linkedin] <http://www.linkedin.com/company/globant>
>> [image: Pinterest] <http://pinterest.com/globant/>
>> [image: Globant] <http://www.globant.com/>
>> The information contained in this e-mail may be confidential. It has been
>> sent for the sole use of the intended recipient(s). If the reader of this
>> message is not an intended recipient, you are hereby notified that any
>> unauthorized review, use, disclosure, dissemination, distribution or
>> copying of this communication, or any of its contents,
>> is strictly prohibited. If you have received it by mistake please let us
>> know by e-mail immediately and delete it from your system. Many thanks.
>> La información contenida en este mensaje puede ser confidencial. Ha sido
>> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
>> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
>> notificado que cualquier lectura, uso, publicación, diseminación,
>> distribución o copiado de esta comunicación o su contenido está
>> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
>> por error le agradeceremos notificarnos por e-mail inmediatamente y
>> eliminarlo de su sistema. Muchas gracias.

View raw message