hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jürgen Thomann (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HIVE-13531) Cache in json_tuple UDF grows larger than it should
Date Wed, 18 May 2016 13:18:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288935#comment-15288935
] 

Jürgen Thomann commented on HIVE-13531:
---------------------------------------

I investigated the problem now a bit more after the second heap dump and this problem can
be reproduced if this UDF is used at the same time in multiple queries.

I'm not sure which is the best version to solve this problem, but there are at least 2 possible
fixes.
1. Change the HashCache to a synchronized Map which is easily done with Collections.synchronizedMap
2. remove the static from the declaration of jsonObjectCache. I not sure why it is static,
but if two different queries uses json_tuple they would use the same cache at the moment which
would reduce the effective cache size for each query.

Another thing is the use of INIT_SIZE = 32 and CACHE_SIZE = 16 with a load factor of 0.6f.
Wouldn't it make more sense to increase the load factor to nearly one and increase the CACHE_SIZE
to 28 or something in that area?

> Cache in json_tuple UDF grows larger than it should
> ---------------------------------------------------
>
>                 Key: HIVE-13531
>                 URL: https://issues.apache.org/jira/browse/HIVE-13531
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 1.1.0
>         Environment: CDH 5.5.0 with Java 1.8.0_45
>            Reporter: Jürgen Thomann
>            Assignee: Jason Dere
>            Priority: Minor
>
> According to the code in ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFJSONTuple.java
the HashCache should never grow larger than 16 entries. In the last OOM of Hive Server 2 I
found this HashCache with over 1 million java.util.LinkedHashMap$Entry objects.
> The code looks right and works single threaded as it should when I tested it isolated.
The only problem I can imagine with my limited Hive source code knowledge that it is accessed
concurrently and somewhere the cleanup with removeEldestEntry is not working in that case.
> I had this problem with Hive 1.1.0 but the current implementation in master looks the
same for the HashCache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message