hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3
Date Fri, 05 Aug 2016 16:41:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409688#comment-15409688
] 

Ashutosh Chauhan commented on HIVE-14270:
-----------------------------------------

There is also a use case where data on S3 is much larger than that what HDFS cluster can hold.
Imagine a case of TBs of table on S3 with only 3 node cluster with minimal space used for
processing. This is a corner case, but may exist.
 In such cases it will be better to still use S3 (although slow) so that query succeeds. Till
we figure out an automated way to discover such a scenario, one option could be to introduce
another boolean config variable to optionally use blob storage for scratch dir even when blostorage
is detected.

> Write temporary data to HDFS when doing inserts on tables located on S3
> -----------------------------------------------------------------------
>
>                 Key: HIVE-14270
>                 URL: https://issues.apache.org/jira/browse/HIVE-14270
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, HIVE-14270.3.patch, HIVE-14270.4.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes and reads
temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such temporary files
on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message