hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lefty Leverenz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11940) "INSERT OVERWRITE" query is very slow because it creates one "distcp" per file to copy data from staging directory to target directory
Date Sun, 03 Jan 2016 10:32:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080389#comment-15080389
] 

Lefty Leverenz commented on HIVE-11940:
---------------------------------------

[~prasanth_j] committed this to branch-1 on Dec. 8, 2015 (commit 445ed86f2b51bdcf8beed5291b1eb11be4fd2b61),
so Fix Version/s should include 1.3.0.

> "INSERT OVERWRITE" query is very slow because it creates one "distcp" per file to copy
data from staging directory to target directory
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11940
>                 URL: https://issues.apache.org/jira/browse/HIVE-11940
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.1
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>             Fix For: 2.0.0
>
>         Attachments: HIVE-11940.1.patch, HIVE-11940.2.patch
>
>
> When hive.exec.stagingdir is set to ".hive-staging", which will be placed under the target
directory when running "INSERT OVERWRITE" query, Hive will grab all files under the staging
directory and copy them ONE BY ONE to target directory.
> When hive exec.stagingdir is set to "/tmp/hive", Hive will simply do a RENAME operation
which will be instant.
> This happens with files that are not encrypted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message