hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "BELUGA BEHR (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16758) Better Select Number of Replications
Date Thu, 01 Jun 2017 15:42:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033176#comment-16033176
] 

BELUGA BEHR commented on HIVE-16758:
------------------------------------

Additionally, I don't think HDFS clients typically have access to the value of "dfs.replication.max".
 I think that configuration is only available in the HDFS NameNode configuration and not in
the HDFS client configurations.  Which means that "dfs.replication.max" will always be the
default value of 512, which in turn means that the replication value specified here will always
be 10.  That's a problem for a small cluster (fewer than 10 nodes) that has set "dfs.replication.max"
to fewer than 10 at the NameNodes.


https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

> Better Select Number of Replications
> ------------------------------------
>
>                 Key: HIVE-16758
>                 URL: https://issues.apache.org/jira/browse/HIVE-16758
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: BELUGA BEHR
>            Priority: Minor
>
> {{org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java}}
> We should be smarter about how we pick a replication number.  We should add a new configuration
equivalent to {{mapreduce.client.submit.file.replication}}.  This value should be around the
square root of the number of nodes and not hard-coded in the code.
> {code}
> public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
> private int minReplication = 10;
>   @Override
>   protected void initializeOp(Configuration hconf) throws HiveException {
> ...
>     int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
>     // minReplication value should not cross the value of dfs.replication.max
>     minReplication = Math.min(minReplication, dfsMaxReplication);
>   }
> {code}
> https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message