hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <>
Subject [jira] [Work logged] (HIVE-23956) Delete delta directory file information should be pushed to execution side
Date Mon, 03 Aug 2020 15:05:00 GMT


ASF GitHub Bot logged work on HIVE-23956:

                Author: ASF GitHub Bot
            Created on: 03/Aug/20 15:04
            Start Date: 03/Aug/20 15:04
    Worklog Time Spent: 10m 
      Work Description: pvargacl commented on a change in pull request #1339:

File path: ql/src/java/org/apache/hadoop/hive/ql/io/orc/
@@ -1641,28 +1645,26 @@ public int compareTo(CompressedOwid other) {
      * Check if the delete delta folder needs to be scanned for a given split's min/max write
      * @param orcSplitMinMaxWriteIds
-     * @param deleteDeltaDir
+     * @param deleteDelta
+     * @param stmtId statementId of the deleteDelta if present
      * @return true when  delete delta dir has to be scanned.
     protected static boolean isQualifiedDeleteDeltaForSplit(AcidOutputFormat.Options orcSplitMinMaxWriteIds,
-        Path deleteDeltaDir)
-    {
-      AcidUtils.ParsedDelta deleteDelta = AcidUtils.parsedDelta(deleteDeltaDir, false);
+        AcidInputFormat.DeltaMetaData deleteDelta, Integer stmtId) {

Review comment:
       it is the second line of parameters, no extra space here

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

Issue Time Tracking

    Worklog Id:     (was: 465739)
    Time Spent: 4h 10m  (was: 4h)

> Delete delta directory file information should be pushed to execution side
> --------------------------------------------------------------------------
>                 Key: HIVE-23956
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Peter Varga
>            Assignee: Peter Varga
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
> Since HIVE-23840 LLAP cache is used to retrieve the tail of the ORC bucket files in the
delete deltas, but to use the cache the fileId must be determined, so one more FileSystem
call is issued for each bucket.
> This fileId is already available during compilation in the AcidState calculation, we
should serialise this to the OrcSplit, and remove the unnecessary FS calls.
> Furthermore instead of sending the SyntheticFileId directly, we should pass the attemptId
instead of the standard path hash, this way the path and the SyntheticFileId. can be calculated,
and it will work even, if the move free delete operations will be introduced.

This message was sent by Atlassian Jira

View raw message