hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-20911) External Table Replication for Hive
Date Fri, 14 Dec 2018 16:59:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-20911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721601#comment-16721601
] 

Hive QA commented on HIVE-20911:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12951813/HIVE-20911.03.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15326/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15326/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15326/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and
output '+ date '+%Y-%m-%d %T.%3N'
2018-12-14 16:56:29.365
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-15326/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-12-14 16:56:29.428
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   687aeef..64930f8  master     -> origin/master
+ git reset --hard HEAD
HEAD is now at 687aeef HIVE-21035: Race condition in SparkUtilities#getSparkSession (Antal
Sinkovits, reviewed by Adam Szita, Denys Kuzmenko)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 64930f8 HIVE-21028: Adding a JDO fetch plan for getTableMeta get_table_meta
to avoid race condition(Karthik Manamcheri, reviewed by Adam Holley, Vihang K and Naveen G)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-12-14 16:56:39.652
+ rm -rf ../yetus_PreCommit-HIVE-Build-15326
+ mkdir ../yetus_PreCommit-HIVE-Build-15326
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-15326
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-15326/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch
error: a/common/src/java/org/apache/hadoop/hive/common/FileUtils.java: does not exist in index
error: a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: does not exist in index
error: a/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenarios.java:
does not exist in index
error: a/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java:
does not exist in index
error: a/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java:
does not exist in index
error: a/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/WarehouseInstance.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/Context.java: does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java: does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplDumpTask.java: does not exist
in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadTask.java: does not exist
in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/ReplLoadWork.java: does not exist
in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/BootstrapEventsIterator.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/DatabaseEventsIterator.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/events/filesystem/FSTableEvent.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadPartitions.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/table/LoadTable.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/incremental/IncrementalLoadTasksBuilder.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/exec/repl/util/ReplUtils.java: does not exist
in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java: does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java: does not exist
in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java: does not
exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/ReplicationSemanticAnalyzer.java: does
not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java: does not exist in
index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/Utils.java: does not exist
in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/DropTableHandler.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/events/InsertHandler.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/PartitionSerializer.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/io/TableSerializer.java: does
not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/MetadataJson.java: does not
exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/InsertHandler.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/MessageHandler.java:
does not exist in index
error: a/ql/src/java/org/apache/hadoop/hive/ql/parse/repl/load/message/TableHandler.java:
does not exist in index
error: a/ql/src/test/org/apache/hadoop/hive/ql/exec/repl/TestReplDumpTask.java: does not exist
in index
error: patch failed: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java:26
Falling back to three-way merge...
Applied patch to 'itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java'
with conflicts.
Going to apply patch with: git apply -p1
error: patch failed: itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java:26
Falling back to three-way merge...
Applied patch to 'itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java'
with conflicts.
U itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosIncrementalLoadAcidTables.java
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-15326
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12951813 - PreCommit-HIVE-Build

> External Table Replication for Hive
> -----------------------------------
>
>                 Key: HIVE-20911
>                 URL: https://issues.apache.org/jira/browse/HIVE-20911
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 4.0.0
>            Reporter: anishek
>            Assignee: anishek
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-20911.01.patch, HIVE-20911.02.patch, HIVE-20911.03.patch
>
>
> External tables are not replicated currently as part of hive replication. As part of
this jira we want to enable that.
> Approach:
> * Target cluster will have a top level base directory config that will be used to copy
all data relevant to external tables. This will be provided via the *with* clause in the *repl
load* command. This base path will be prefixed to the path of the same external table on source
cluster.
> * Since changes to directories on the external table can happen without hive knowing
it, hence we cant capture the relevant events when ever new data is added or removed, we will
have to copy the data from the source path to target path for external tables every time we
run incremental replication.
> ** this will require incremental *repl dump*  to now create an additional file *\_external\_tables\_info*
with data in the following form 
> {code}
> tableName,base64Encoded(tableDataLocation)
> {code}
> In case there are different partitions in the table pointing to different locations there
will be multiple entries in the file for the same table name with location pointing to different
partition locations. For partitions created in a table without specifying the _set location_
command will be within the same table Data location and hence there will not be different
entries in the file above 
> ** *repl load* will read the  *\_external\_tables\_info* to identify what locations are
to be copied from source to target and create corresponding tasks for them.
> * New External tables will be created with metadata only with no data copied as part
of regular tasks while incremental load/bootstrap load.
> * Bootstrap dump will also create  *\_external\_tables\_info* which will be used to copy
data from source to target  as part of boostrap load.
> * Bootstrap load will create a DAG, that can use parallelism in the execution phase,
the hdfs copy related tasks are created, once the bootstrap phase is complete.
> * Since incremental load results in a DAG with only sequential execution ( events applied
in sequence ) to effectively use the parallelism capability in execution mode, we create tasks
for hdfs copy along with the incremental DAG. This requires a few basic calculations to approximately
meet the configured value in  "hive.repl.approx.max.load.tasks" 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message