hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <>
Subject [jira] [Commented] (HIVE-19937) Intern JobConf objects in Spark tasks
Date Fri, 29 Jun 2018 01:29:00 GMT


Sahil Takiar commented on HIVE-19937:

Thanks for the review Xuefu!

[] could you take a quick look too, just want to make sure I am doing this
correctly. The high level explanation is that in Hive on Spark, a single Spark executor (a
single JVM) can run multiple Spark tasks in parallel. Each task creates its own copy of {{JobConf}}
(basically all the Hive properties). This can cause significant overhead in the JVM, so this
patch basically interns all the entries in {{JobConf}}.

> Intern JobConf objects in Spark tasks
> -------------------------------------
>                 Key: HIVE-19937
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: HIVE-19937.1.patch
> When fixing HIVE-16395, we decided that each new Spark task should clone the {{JobConf}}
object to prevent any {{ConcurrentModificationException}} from being thrown. However, setting
this variable comes at a cost of storing a duplicate {{JobConf}} object for each Spark task.
These objects can take up a significant amount of memory, we should intern them so that Spark
tasks running in the same JVM don't store duplicate copies.

This message was sent by Atlassian JIRA

View raw message