hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <>
Subject [jira] [Commented] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset
Date Wed, 14 Dec 2016 00:27:01 GMT


Hive QA commented on HIVE-15422:

Here are the results of testing the latest attachment:

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10815 tests executed
*Failed tests:*
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1] (batchId=150)

Test results:
Console output:
Test logs:

Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed

This message is automatically generated.

ATTACHMENT ID: 12843119 - PreCommit-HIVE-Build

> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of
objects for partitioned dataset
> --------------------------------------------------------------------------------------------------------------------
>                 Key: HIVE-15422
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HIVE-15422.1.patch, HIVE-15422.2.patch, Profiler_Snapshot_HIVE-15422.png
> When executing the following query in LLAP (single instance) in a 5 node cluster, lots
of GC pressure was observed.
> {noformat}
> select a.type, , a.frequency,,,, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large amount
of objects created just in path comparisons in HiveInputFormat.  HIVE-15405 reduces number
of path comparisons at FileUtils, but it still ends up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.

This message was sent by Atlassian JIRA

View raw message