hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-21466) Increase Default Size of SPLIT_MAXSIZE
Date Tue, 19 Mar 2019 04:43:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-21466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795672#comment-16795672
] 

Hive QA commented on HIVE-21466:
--------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 59s{color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 20s{color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 16s{color}
| {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 34s{color} | {color:blue}
common in master has 63 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 14s{color} |
{color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 19s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 18s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 18s{color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 16s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 44s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 14s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 14s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 12m 50s{color} | {color:black}
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03)
x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-16566/dev-support/hive-personality.sh
|
| git revision | master / 36bd89d |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: common U: common |
| Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-16566/yetus.txt |
| Powered by | Apache Yetus    http://yetus.apache.org |


This message was automatically generated.



> Increase Default Size of SPLIT_MAXSIZE
> --------------------------------------
>
>                 Key: HIVE-21466
>                 URL: https://issues.apache.org/jira/browse/HIVE-21466
>             Project: Hive
>          Issue Type: Improvement
>          Components: Configuration
>    Affects Versions: 4.0.0, 3.2.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Minor
>         Attachments: HIVE-21466.1.patch, HIVE-21466.1.patch
>
>
> {code:java}
>  MAPREDMAXSPLITSIZE(FileInputFormat.SPLIT_MAXSIZE, 256000000L, "", true),
> {code}
> [https://github.com/apache/hive/blob/8d4300a02691777fc96f33861ed27e64fed72f2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L682]
> This field specifies a maximum size for each MR (maybe other?) splits.
> This number should be a multiple of the HDFS Block size. The way that this maximum is
implemented, is that each block is added to the split, and if the split grows to be larger
than the maximum allowed, the split is submitted to the cluster and a new split is opened.
> So, imagine the following scenario:
>  * HDFS block size of 16 bytes
>  * Maximum size of 40 bytes
> This will produce a split with 3 blocks. (2x16) = 32; another block will be inserted,
(3x16) = 48 bytes in the split. So, while many operators would assume a split of 2 blocks,
the actual is 3 blocks. Setting the maximum split size to a multiple of the HDFS block size
will make this behavior less confusing.
> The current setting is ~256MB and when this was introduced, the default HDFS block size
was 64MB. That is a factor of 4x. However, now HDFS block sizes are 128MB by default, so I
propose setting this to 4x128MB.  The larger splits (fewer tasks) should give a nice performance
boost for modern hardware.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message