phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoffrey Jacoby (Jira)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-5315) Cross cluster replication of the base table only should be sufficient
Date Fri, 11 Oct 2019 18:10:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Geoffrey Jacoby updated PHOENIX-5315:
-------------------------------------
    Description: 
When replicating Phoenix tables using the HBase cross cluster replication facility, it should
be sufficient (and must, for correctness and avoidance of race conditions and inconsistencies)
to replicate the base table only. On the sink cluster the replication client's application
of mutations from the replication stream to the local base table should trigger all necessary
index update operations. To the extent that won't happen now due to implementation details,
those details should be reworked.

This also has important efficiency benefits: no matter how many indexes are defined for a
base table, only the base table updates need be replicated (presuming Phoenix schema is synchronized
over all sites by some other external means).

This would likely constitute multiple components, so we should use this issue as an umbrella.
We'd need:
 # A Phoenix implementation of HBase's ReplicationEndpoint that tails the WAL like a normal
replication endpoint. However, rather than writing to HBase's replication sink APIs (which
create HBase RPCs to a remote cluster), they should write to a new Phoenix Endpoint coprocessor.
 # An HBase coprocessor Endpoint hook that takes in a request from a remote cluster (containing
both the WALEdit's data and the WALKey's annotated metadata telling the remote cluster what
tenant_id, logical tablename, and timestamp the data is associated with). Ideally the API's
message format should be configurable, and could be either a protobuf or an Avro schema similar
to the one described by PHOENIX-5443. The endpoint hook would take the metadata + data and
regenerate a complete set of Phoenix mutations, both data and indexes, just as the phoenix
client did for the original SQL statement that generated the source-side edits. These mutations
would be written to the remote cluster by the normal Phoenix write path. 

(Unfortunately, HBase uses the term "endpoint" to mean both a replication plugin, AND a stored-procedure-like
coprocessor hook. To be clear, 1 is a replication plugin, 2 is a coprocessor hook)

 

  was:
When replicating Phoenix tables using the HBase cross cluster replication facility, it should
be sufficient (and must, for correctness and avoidance of race conditions and inconsistencies)
to replicate the base table only. On the sink cluster the replication client's application
of mutations from the replication stream to the local base table should trigger all necessary
index update operations. To the extent that won't happen now due to implementation details,
those details should be reworked. 

This also has important efficiency benefits: no matter how many indexes are defined for a
base table, only the base table updates need be replicated (presuming Phoenix schema is synchronized
over all sites by some other external means). 


> Cross cluster replication of the base table only should be sufficient
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-5315
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5315
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>
> When replicating Phoenix tables using the HBase cross cluster replication facility, it
should be sufficient (and must, for correctness and avoidance of race conditions and inconsistencies)
to replicate the base table only. On the sink cluster the replication client's application
of mutations from the replication stream to the local base table should trigger all necessary
index update operations. To the extent that won't happen now due to implementation details,
those details should be reworked.
> This also has important efficiency benefits: no matter how many indexes are defined for
a base table, only the base table updates need be replicated (presuming Phoenix schema is
synchronized over all sites by some other external means).
> This would likely constitute multiple components, so we should use this issue as an
umbrella. We'd need:
>  # A Phoenix implementation of HBase's ReplicationEndpoint that tails the WAL like a
normal replication endpoint. However, rather than writing to HBase's replication sink APIs
(which create HBase RPCs to a remote cluster), they should write to a new Phoenix Endpoint
coprocessor.
>  # An HBase coprocessor Endpoint hook that takes in a request from a remote cluster (containing
both the WALEdit's data and the WALKey's annotated metadata telling the remote cluster what
tenant_id, logical tablename, and timestamp the data is associated with). Ideally the API's
message format should be configurable, and could be either a protobuf or an Avro schema similar
to the one described by PHOENIX-5443. The endpoint hook would take the metadata + data and
regenerate a complete set of Phoenix mutations, both data and indexes, just as the phoenix
client did for the original SQL statement that generated the source-side edits. These mutations
would be written to the remote cluster by the normal Phoenix write path. 
> (Unfortunately, HBase uses the term "endpoint" to mean both a replication plugin, AND
a stored-procedure-like coprocessor hook. To be clear, 1 is a replication plugin, 2 is a coprocessor
hook)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message