hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Misha Dmitriev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19937) Intern JobConf objects in Spark tasks
Date Sat, 30 Jun 2018 19:55:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528876#comment-16528876
] 

Misha Dmitriev commented on HIVE-19937:
---------------------------------------

[~stakiar] regarding the behavior of {{CopyOnFirstWriteProperties}} - such fine-grain behavior
would be easy to implement. It will require changing the implementation of this class so that
it has pointers to two hashtables: one for properties that are specific/unique for the given
instance of {{COFWP}} and another table with properties that are common/default for all instances
of {{COFWP}}. Each get() call should first check the first (specific) hashtable and then the
second (default) hashtable, and each put() call should work only with the first hashtable.
This would make sense in a situation when there is a sufficiently big number of common properties,
but every/almost every table also has some specific properties. In contrast, the current {{CopyOnFirstWriteProperties}}
works best when most tables are exactly the same and only a few are different. Well, after
writing all this I realize that the proposed changed implementation of {{COFWP}} would probably
be better in all scenarios. But before deciding on anything, we definitely should measure
where the memory goes in realistic scenarios.

Regarding interning only values in {{PartitionDesc#internProperties}} : yes, I think this
was intentional - I carefully analyzed heap dumps before making this change, so if it was
worth interning the keys, I would have done that too. Most probably when these tables are
created, the Strings for keys already come from some source where they are already interned.

> Intern JobConf objects in Spark tasks
> -------------------------------------
>
>                 Key: HIVE-19937
>                 URL: https://issues.apache.org/jira/browse/HIVE-19937
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: HIVE-19937.1.patch
>
>
> When fixing HIVE-16395, we decided that each new Spark task should clone the {{JobConf}}
object to prevent any {{ConcurrentModificationException}} from being thrown. However, setting
this variable comes at a cost of storing a duplicate {{JobConf}} object for each Spark task.
These objects can take up a significant amount of memory, we should intern them so that Spark
tasks running in the same JVM don't store duplicate copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message