phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samarth Jain (JIRA)" <>
Subject [jira] [Created] (PHOENIX-3645) Build a mechanism for creating a table and populating it with data from a source table
Date Thu, 02 Feb 2017 00:43:51 GMT
Samarth Jain created PHOENIX-3645:

             Summary: Build a mechanism for creating a table and populating it with data from
a source table
                 Key: PHOENIX-3645
             Project: Phoenix
          Issue Type: New Feature
            Reporter: Samarth Jain

As part of PHOENIX-1598, we are introducing the capability of mapping column names and encoding
column values. For users to be able to use this new scheme, they would need to recreate their
tables from the scratch. For situations like this, it would be nice to have a mechanism where
we can create a new table and fill it with data of the existing table. 

A simple possibility is to disable the source table, take a snapshot of it, create new table
using the snapshot of the old table, and drop the old table. However, this would require downtime.

Another way would be use an UPSERT INTO TARGET TABLE SELECT * FROM SOURCE TABLE or a map reduce
job to the bulk load. These mechanisms though have the inherent limitation that they miss
the updates to the old table after they were kicked off or after they were complete. To handle
the case of these missing updates, a somewhat crazy idea would be mark the new table as an
index on the existing table. The index table would have the same exact schema as the data
table. Incremental changes would then be automatically taken care of by our index change mechanism.
We can then use our existing map reduce index build job to bulk load the "old" data into the
new table.

There is a slight chance that we would miss the update happening to the source table when
we are in the process of doing the index->table conversion.

One way to handle that would be store the physical hbase table name for a phoenix table in
the SYSTEM.CATALOG. Then the reducer of the map reduce job would simply have to change this
mapping in the SYSTEM.CATALOG table. This should cause the new updates to go to the new hbase

There are probably some edge cases or gotchas that I am not thinking about right now. [~jamestaylor],
probably has more thoughts on this.

This message was sent by Atlassian JIRA

View raw message