hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-21312) FSStatsAggregator::connect is slow
Date Thu, 28 Feb 2019 14:17:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-21312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780555#comment-16780555
] 

Rajesh Balamohan commented on HIVE-21312:
-----------------------------------------

Thanks for the review [~kgyrtkirk]. Attaching the revised patch .

>> use try-with-resources instead of: in = new Input(fs.open(file.getPath())); and closing
it?
Fixed this.

>> use the return value of the future instead of returning just null; and that way you
will not need the "concurrectqueue" as the executor has 
it built in already.
<Void> is intentional and hence returning null. Getting the future value within the
loop would be blocking call and would make it sequential as it has to wait for the result.

>> statsMap should be a local variable; I don't understand why it was a field
Fixed this.

>> the log messages will come from a totally different thread than it have logged from
before; this will make the log less readable - log them after they are back in the original
thread
Thread id `stats-updater-thread-%d` would be helpful in debugging.

>> is there a reason to raise "read stats" log messages from trace to info? I think
it will just make noise
Fixed this. Moved this to trace.

>> I think it would be better to use cachedthreadpool - in case we are reading just
a few files - there is no reason to launch more threads
Having fixed threadpool would help restricting the number of threads to corepool. If few files
are used, it would not go beyond the threadpool size. But if cached thread pool is used, it
can create too many number of threads (not boundeded). Hence the usage of fixed threadpool.

> FSStatsAggregator::connect is slow
> ----------------------------------
>
>                 Key: HIVE-21312
>                 URL: https://issues.apache.org/jira/browse/HIVE-21312
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Trivial
>         Attachments: HIVE-21312.1.patch, HIVE-21312.2.patch, HIVE-21312.3.patch, HIVE-21312.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message