hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Behdad Forghani <beh...@exapackets.com>
Subject Re: Automating major compactions
Date Wed, 08 Jul 2015 17:18:43 GMT
To start major compaction for tablename from cli, you need to run:
echo major_compact tablename | hbase shell

I do this after bulk loading to the table.

FYI, to avoid surprises, I also turn off load balancer and rebalance
regions manually.

The cli command to turn off balancer is:
echo balance_switch false | hbase shell

To rebalance regions after a bulk load or other changes, run:
echo balance | hbase shell

You  can run these two command using ssh. I use Ansible to do these.
Assuming you have defined hbase_master in your hosts file, you can run:
ansible -i hosts hbase_master -a "echo major_compact tablename | hbase

Behdad Forghani

On Wed, Jul 8, 2015 at 8:03 AM, Dejan Menges <dejan.menges@gmail.com> wrote:

> Hi,
> What's the best way to automate major compactions without enabling it
> during off peak period?
> What I was testing is simple script which runs on every node in cluster,
> checks if there is major compaction already running on that node, if not
> picks one region for compaction and run compaction on that one region.
> It's running for some time and it helped us get our data to much better
> shape, but now I'm not quite sure how to choose anymore which region to
> compact. So far I was reading for that node rs-status#regionStoreStats and
> first choosing the one with biggest amount of storefiles, and then those
> with biggest storefile sizes.
> Is there maybe something more intelligent I could/should do?
> Thanks a lot!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message