hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vineet Garg (JIRA)" <>
Subject [jira] [Commented] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default
Date Wed, 12 Dec 2018 00:46:00 GMT


Vineet Garg commented on HIVE-17935:

[~asherman] Since now this optimization is turned on by default (HIVE-20703 & HIVE-20915)
I don't believe we need this JIRA anymore. Is it ok to close it?

> Turn on hive.optimize.sort.dynamic.partition by default
> -------------------------------------------------------
>                 Key: HIVE-17935
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Andrew Sherman
>            Priority: Major
>         Attachments: HIVE-17935.1.patch, HIVE-17935.2.patch, HIVE-17935.3.patch, HIVE-17935.4.patch,
HIVE-17935.5.patch, HIVE-17935.6.patch, HIVE-17935.7.patch, HIVE-17935.8.patch
> The config option hive.optimize.sort.dynamic.partition is an optimization for Hive’s
dynamic partitioning feature. It was originally implemented in [HIVE-6455|].
With this optimization, the dynamic partition columns and bucketing columns (in case of bucketed
tables) are sorted before being fed to the reducers. Since the partitioning and bucketing
columns are sorted, each reducer can keep only one record writer open at any time thereby
reducing the memory pressure on the reducers. There were some early problems with this optimization
and it was disabled by default in HiveConf in [HIVE-8151|].
Since then setting hive.optimize.sort.dynamic.partition=true has been used to solve problems
where dynamic partitioning produces with (1) too many small files on HDFS, which is bad for
the cluster and can increase overhead for future Hive queries over those partitions, and (2)
OOM issues in the map tasks because it trying to simultaneously write to 100 different files.

> It now seems that the feature is probably mature enough that it can be enabled by default.

This message was sent by Atlassian JIRA

View raw message