hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <>
Subject [jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max
Date Tue, 14 Feb 2017 22:21:41 GMT


Sergio Peña commented on HIVE-15881:

Great, thanks for the suggestions. [~poeppt] Although I like your idea of the 'maximum number
allowable' using 0, I think we should continue using the 0 as using only one thread for the
work. The rest of the configuration variables for threads use 0 to disable the use of threads.
Let's keep it consistent.

I will submit a patch with the following:
- New variable name {{hive.exec.input.listing.max.threads}} for getInputSummary and getInputPaths
- Mark {{mapred.dfsclient.parallelism.max}} as deprecated, but continue using it.
- Default the value for {{hive.exec.input.listing.max.threads}} to 0 (no threads or just one
thread). I think we should keep it disable because
  on HDFS there's no benefit of using threads, and we can multiple RPC connections with the

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> ------------------------------------------------------------------------------
>                 Key: HIVE-15881
>                 URL:
>             Project: Hive
>          Issue Type: Task
>          Components: Query Planning
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>            Priority: Minor
> The Utilities class has two methods, {{getInputSummary}} and {{getInputPaths}}, that
use the variable {{mapred.dfsclient.parallelism.max}} to get the summary of a list of input
locations in parallel. These methods are Hive related, but the variable name does not look
it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just found a reference
on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, and use
a different variable name, such as {{hive.get.input.listing.num.threads}}, that reflects the
intention of the variable. The removal of the old variable might happen on Hive 3.x

This message was sent by Atlassian JIRA

View raw message