hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "anishek (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-18341) Add repl load support for adding "raw" namespace for TDE with same encryption keys
Date Thu, 28 Dec 2017 08:37:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-18341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

anishek updated HIVE-18341:
---------------------------
    Attachment: HIVE-18341.1.patch

[~thejas] I have included the changes as provided in the distcp page for "/.reserved/raw",
however it looks like distcp copy fails with "checksum-mistmatch" exception. this shouldnt
have happened since the two different zones are using the same keys, output from logs :
{code}
Check-sum mismatch between hdfs://localhost:53536/.reserved/raw/warehouse0/targetandsourcehavesameencryptionzonekeys_1514449998552.db/encrypted_table/000000_0_copy_1
and hdfs://localhost:53536/.reserved/raw/warehouse1/replicated_targetandsourcehavesameencryptionzonekeys_1514449998552.db/encrypted_table/.hive-staging_hive_2017-12-28_00-33-30_893_6165151359381350374-1/-ext-10001/.distcp.tmp.attempt_local327098851_0003_m_000000_0
{code}
The test case is "targetAndSourceHaveSameEncryptionZoneKeys".

Additionally i have also included changes to do the regular file copies ( when either just
1 file or if file size is small ) to be done under *doAs* using the user configuration provided
for distcp ("hive.distcp.privileged.doAs").  

One thing to note is since for regular file copies we use the fileSystem copy, even for TDE
deployments with same keys we wont be able to leverage the optimization that distcp does,
this will be of particular interest for ACID table replications where we will mostly transfer
1 delta file per table with in a transaction.

> Add repl load support for adding "raw" namespace for TDE with same encryption keys
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-18341
>                 URL: https://issues.apache.org/jira/browse/HIVE-18341
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: anishek
>            Assignee: anishek
>             Fix For: 3.0.0
>
>         Attachments: HIVE-18341.0.patch, HIVE-18341.1.patch
>
>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html#Running_as_the_superuser
> "a new virtual path prefix, /.reserved/raw/, that gives superusers direct access to the
underlying block data in the filesystem. This allows superusers to distcp data without needing
having access to encryption keys, and also avoids the overhead of decrypting and re-encrypting
data."
> We need to introduce a new option in "Repl Load" command that will change the files being
copied in distcp to have this "/.reserved/raw/" namespace before the file paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message