hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-21109) Stats replication for ACID tables.
Date Mon, 01 Apr 2019 11:14:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21109?focusedWorklogId=221241&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-221241
]

ASF GitHub Bot logged work on HIVE-21109:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Apr/19 11:13
            Start Date: 01/Apr/19 11:13
    Worklog Time Spent: 10m 
      Work Description: sankarh commented on pull request #579: HIVE-21109 : Support stats
replication for ACID tables.
URL: https://github.com/apache/hive/pull/579#discussion_r270818464
 
 

 ##########
 File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
 ##########
 @@ -297,21 +303,34 @@ private ColumnStatisticsDesc getColumnStatsDesc(String dbName,
 
   private int persistColumnStats(Hive db) throws HiveException, MetaException, IOException
{
     ColumnStatistics colStats = constructColumnStatsFromInput();
-    ColumnStatisticsDesc colStatsDesc = colStats.getStatsDesc();
-    // We do not support stats replication for a transactional table yet. If we are converting
-    // a non-transactional table to a transactional table during replication, we might get
-    // column statistics but we shouldn't update those.
-    if (work.getColStats() != null &&
-        AcidUtils.isTransactionalTable(getHive().getTable(colStatsDesc.getDbName(),
-                                                          colStatsDesc.getTableName())))
{
-      LOG.debug("Skipped updating column stats for table " +
-                TableName.getDbTable(colStatsDesc.getDbName(), colStatsDesc.getTableName())
+
-                " because it is converted to a transactional table during replication.");
-      return 0;
-    }
-
     SetPartitionsStatsRequest request =
             new SetPartitionsStatsRequest(Collections.singletonList(colStats));
+
+    // Set writeId and validWriteId list for replicated statistics.
+    if (work.getColStats() != null) {
+      String dbName = colStats.getStatsDesc().getDbName();
+      String tblName = colStats.getStatsDesc().getTableName();
+      Table tbl = db.getTable(dbName, tblName);
+      long writeId = work.getWriteId();
+      // If it's a transactional table on source and target, we will get a valid writeId
+      // associated with it. Otherwise it's a non-transactional table on source migrated
to a
+      // transactional table on target, we need to craft a valid writeId here.
+      if (AcidUtils.isTransactionalTable(tbl)) {
+        ValidWriteIdList writeIds;
+        if (writeId <= 0) {
+          Long tmpWriteId = ReplUtils.getMigrationCurrentTblWriteId(conf);
+          if (tmpWriteId == null) {
+            throw new HiveException("DDLTask : Write id is not set in the config by open
txn task for migration");
+          }
+          writeId = tmpWriteId;
+        }
+        writeIds = new ValidReaderWriteIdList(TableName.getDbTable(dbName, tblName), new
long[0],
 
 Review comment:
   I think, this assumption can change in future if someone uses this task to update stats
even in non-repl flow. I suggest to add explicit check for repl scope.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 221241)
    Time Spent: 11h  (was: 10h 50m)

> Stats replication for ACID tables.
> ----------------------------------
>
>                 Key: HIVE-21109
>                 URL: https://issues.apache.org/jira/browse/HIVE-21109
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Ashutosh Bapat
>            Assignee: Ashutosh Bapat
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-21109.01.patch, HIVE-21109.02.patch, HIVE-21109.03.patch, HIVE-21109.04.patch,
HIVE-21109.05.patch, HIVE-21109.06.patch, HIVE-21109.07.patch, HIVE-21109.08.patch
>
>          Time Spent: 11h
>  Remaining Estimate: 0h
>
> Transactional tables require a writeID associated with the stats update. This writeId
needs to be in sync with the writeId on the source and hence needs to be replicated from
the source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message