hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <>
Subject [jira] [Commented] (HIVE-16014) HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of for pool size
Date Fri, 24 Feb 2017 06:23:44 GMT


Hive QA commented on HIVE-16014:

Here are the results of testing the latest attachment:

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10258 tests executed
*Failed tests:*
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=223)
org.apache.hadoop.hive.thrift.TestHadoopAuthBridge23.testSaslWithHiveMetaStore (batchId=220)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel (batchId=211)

Test results:
Console output:
Test logs:

Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed

This message is automatically generated.

ATTACHMENT ID: 12854275 - PreCommit-HIVE-Build

> HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of
for pool size
> --------------------------------------------------------------------------------------------------------------
>                 Key: HIVE-16014
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>         Attachments: HIVE-16014.01.patch, HIVE-16014.02.patch
> HiveMetastoreChecker uses configuration value for determining the
pool size as below :
> {noformat}
> private void checkPartitionDirs(Path basePath, Set<Path> allDirs, int maxDepth)
throws IOException, HiveException {
>     ConcurrentLinkedQueue<Path> basePaths = new ConcurrentLinkedQueue<>();
>     basePaths.add(basePath);
>     Set<Path> dirSet = Collections.newSetFromMap(new ConcurrentHashMap<Path,
>     // Here we just reuse the THREAD_COUNT configuration for
>     int poolSize = conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 15);
>     // Check if too low config is provided for move files. 2x CPU is reasonable max count.
>     poolSize = poolSize == 0 ? poolSize : Math.max(poolSize,
>         Runtime.getRuntime().availableProcessors() * 2);
> {noformat}
> msck is commonly used to add the missing partitions for the table from the Filesystem.
In such a case different pool sizes for HMSHandler and HiveMetastoreChecker can affect the
performance. Eg. If {{hive.metastore.fshandler.threads}} is set to a lower value like 15 and
{{}} is much higher like 100 or vice versa the smaller pool will become
the bottleneck. If would be good to use {{hive.metastore.fshandler.threads}} to size the pool
for HiveMetastoreChecker since the number missing partitions and number of partitions to be
added will most likely be the same. In such a case the performance of the query will be optimum
when both the pool sizes are same.
> Since it is possible to tune both the configs individually it will be very likely that
they may be different. But since there is a strong co-relation between amount of work done
by HiveMetastoreChecker and HiveMetastore.add_partitions call it might be a good idea to use
{{hive.metastore.fshandler.threads}} for pool size instead of {{}}

This message was sent by Atlassian JIRA

View raw message