hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szehon Ho (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-20153) Count and Sum UDF consume more memory in Hive 2+
Date Thu, 12 Jul 2018 16:38:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-20153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Szehon Ho updated HIVE-20153:
-----------------------------
    Description: 
While playing with Hive2, we noticed that queries with a lot of count() and sum() aggregations
run out of memory on Hadoop side much faster than in Hive1.  In many queries, we have to
double the memory.

 

Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' in GeneraicUDAFSum
and GenericUDAFCount, which was added to support Window functions.

  was:While playing with Hive2, we noticed that queries with a lot of count() and sum() aggregations
run out of memory on Hadoop side much faster than in Hive1.  Taking heap dump, we see one
of the main culprit is the field 'uniqueObjects' in GeneraicUDAFSum and GenericUDAFCount,
which was added to support Window functions.


> Count and Sum UDF consume more memory in Hive 2+
> ------------------------------------------------
>
>                 Key: HIVE-20153
>                 URL: https://issues.apache.org/jira/browse/HIVE-20153
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 2.3.2
>            Reporter: Szehon Ho
>            Priority: Major
>
> While playing with Hive2, we noticed that queries with a lot of count() and sum() aggregations
run out of memory on Hadoop side much faster than in Hive1.  In many queries, we have to
double the memory.
>  
> Taking heap dump, we see one of the main culprit is the field 'uniqueObjects' in GeneraicUDAFSum
and GenericUDAFCount, which was added to support Window functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message