phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Saurabh Agarwal (BLOOMBERG/ 731 LEX)" <>
Subject Re: Table replication
Date Wed, 15 Jun 2016 00:43:58 GMT
Hi James,

Thanks for providing the detailed info on replication.   Two questions. 

1. I am not clear how the replication work in term of view. Is this open issue wrt replication?

2.  As you mention that there are still work required wrt the combination of transaction and
replication? Does this work need to be done in hbase or Phoenix? Are there any existing Jira
for this work? 

Sent from Bloomberg Professional for iPhone 

----- Original Message -----
From: James Taylor <>
At: 09-Jun-2016 11:42:46

Hi JM,
Are you looking toward replication to support DR? If so, you can rely on HBase-level replication
with a few gotchas and some operational hurdles:

- When upgrading Phoenix versions, upgrade the server-side first for both the primary and
secondary cluster. You can do a rolling upgrade and old clients will continue to work with
the upgraded server, so no downtime is required (see Backward Compatibility[1] for more details).
- Execute Phoenix DDL (i.e. user-level changes to existing Phoenix tables, creation of new
tables, indexes, sequences) against both the primary and secondary cluster with replication
suspended (as otherwise you end up with a race condition for the replication of the SYSTEM.CATALOG
table and any not yet existing tables). If you've upgraded Phoenix, then even if there's no
DDL, you should at a minimum connect a Phoenix client to both the primary and secondary cluster
to trigger any upgrades to Phoenix system tables. Once the DDL is complete, resume replication.

- Do not replicate the SYSTEM.SEQUENCE table since replication is asynchronous and may fall
behind which would be a big issue if switching over to the secondary cluster as sequence values
could start repeating. Instead, incorporate a cluster ID into any sequence-based identifiers
and concatenate this with the sequence value. In that way, the identifiers will continue to
be unique after a DR event.
- Replicate Phoenix indexes just like data tables as the HBase-level replication of the data
table will not trigger index updates.
- In theory, you really only need to replicate views from SYSTEM.CATALOG since you're executing
DDL on both the primary and secondary cluster, however I don't think HBase has that capability
(but it sure would be nice). FWIW, we're thinking of separating views from table definitions
into separate Phoenix tables but need to first make these tables transactional (we're using
an HBase mechanism that allows all or none commits to the SYSTEM.CATALOG, but it only works
if all updates are to the same RS which is too limiting).
- It's a good idea to monitor the depth of the replication queue so you know if/when replication
is falling behind.
- Care has to be taken wrt keeping deleted cells on both clusters if you want to support point-in-time
backup and restore, as it's possible that compaction would remove cells before you're backup
window has passed (this orthogonal to replication, but just wanted to bring it up).
- Given the asynchronous nature of HBase replication, there's no good way of knowing the transaction
ID (i.e. timestamp) at which you have all of the data. Also, replication of the state that
is kept by the transaction manager in terms of inflight and invalid transactions is left as
an exercise to the reader. :-) In short - there's still some work to do wrt the combination
of transactions and replication (but it'd be really interesting work if anyone is interested).

HTH. Thanks,



On Thu, Jun 9, 2016 at 7:56 AM, anil gupta <> wrote:

Hi Jean,

Phoenix does not supports replication at present.(It will be super awesome if it can) So,
if you want to do replication of Phoenix tables you will need to setup replication of all
the underlying HBase tables for corresponding Phoenix tables.

I think you will need to replicate all the Phoenix system hbase tables, Global/Local secondary
index table and then Primary Phoenix table.

I haven't done it yet. But, above is the way i would approach it.

Anil Gupta.

On Thu, Jun 9, 2016 at 6:49 AM, Jean-Marc Spaggiari <> wrote:


When Phoenix is used, what is the recommended way to do replication?

Replication acts as a client on the 2nd cluster, so should we simply configure Phoenix on
both cluster and on the destination it will take care of updating the index tables, etc. Or
should all the tables on the destination side, including Phoenix tables, be replicated on
the destination side too? I seached a bit about that on the Phoenix site and google and did
not find anything.



Thanks & Regards,
Anil Gupta

View raw message