hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19668) Over 30% of the heap wasted by duplicate org.antlr.runtime.CommonToken's and duplicate strings
Date Tue, 05 Jun 2018 22:42:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502604#comment-16502604
] 

Hive QA commented on HIVE-19668:
--------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 42s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 56s{color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 41s{color}
| {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 19s{color} | {color:blue}
ql in master has 2280 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 48s{color} |
{color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 13s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 55s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 55s{color} | {color:green}
the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 39s{color} | {color:red}
ql: The patch generated 5 new + 720 unchanged - 0 fixed = 725 total (was 720) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 30s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 50s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 11s{color} | {color:red}
The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 15s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03)
x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-11536/dev-support/hive-personality.sh
|
| git revision | master / afc5fa4 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-11536/yetus/diff-checkstyle-ql.txt
|
| asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-11536/yetus/patch-asflicense-problems.txt
|
| modules | C: ql U: ql |
| Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-11536/yetus.txt |
| Powered by | Apache Yetus    http://yetus.apache.org |


This message was automatically generated.



> Over 30% of the heap wasted by duplicate org.antlr.runtime.CommonToken's and duplicate
strings
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-19668
>                 URL: https://issues.apache.org/jira/browse/HIVE-19668
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 3.0.0
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>            Priority: Major
>         Attachments: HIVE-19668.01.patch, image-2018-05-22-17-41-39-572.png
>
>
> I've recently analyzed a HS2 heap dump, obtained when there was a huge memory spike during
compilation of some big query. The analysis was done with jxray ([www.jxray.com).|http://www.jxray.com)./]
It turns out that more than 90% of the 20G heap was used by data structures associated with
query parsing ({{org.apache.hadoop.hive.ql.parse.QBExpr}}). There are probably multiple opportunities
for optimizations here. One of them is to stop the code from creating duplicate instances
of {{org.antlr.runtime.CommonToken}} class. See a sample of these objects in the attached
image:
> !image-2018-05-22-17-41-39-572.png|width=879,height=399!
> Looks like these particular {{CommonToken}} objects are constants, that don't change
once created. I see some code, e.g. in {{org.apache.hadoop.hive.ql.parse.CalcitePlanner}},
where such objects are apparently repeatedly created with e.g. {{new CommonToken(HiveParser.TOK_INSERT,
"TOK_INSERT")}} If these 33 token kinds are instead created once and reused, we will save
more than 1/10th of the heap in this scenario. Plus, since these objects are small but very
numerous, getting rid of them will remove a gread deal of pressure from the GC.
> Another source of waste are duplicate strings, that collectively waste 26.1% of memory.
Some of them come from CommonToken objects that have the same text (i.e. for multiple CommonToken
objects the contents of their 'text' Strings are the same, but each has its own copy of that
String). Other duplicate strings come from other sources, that are easy enough to fix by adding
String.intern() calls.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message