hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Mollitor (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-22402) Deprecate and Replace Hive PerfLogger
Date Fri, 01 Nov 2019 14:40:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-22402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964876#comment-16964876
] 

David Mollitor commented on HIVE-22402:
---------------------------------------

Hello watchers.  Any thoughts on this? :)

> Deprecate and Replace Hive PerfLogger
> -------------------------------------
>
>                 Key: HIVE-22402
>                 URL: https://issues.apache.org/jira/browse/HIVE-22402
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 4.0.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>         Attachments: HIVE-22402.1.patch, HIVE-22402.2.patch, HIVE-22402.3.patch, HIVE-22402.4.patch
>
>
> Recently I wanted to add some additional capability, and add more, performance logging
to support my troubleshooting efforts. I started looking at PerfLogger and started to examine
its usage. I discovered a few things:
>  # Since 'loggers' must be open and closed manually, I found a couple of places where
loggers were opened, but not closed, rendering them useless
>  # Since 'loggers' must be closed manually, I found a few places where an early-return
or Exception thrown would cause a logger to not be closed, thereby rendering it useless
>  # Session information is not logged, so it can be difficult to precisely pinpoint which
session is taking lots of time
>  # PerfLogger overloaded. Most of the time, it's being used as a simple timer mechanism
with automatic logging in SLF4J debug. However, it is also a facade over the Hive Metrics
subsystem and timing results are automatically published to Metrics and then there becomes
this dependency on a 'logger' to be able to access metric data as well.
> The last bullet is the most challenging part and why I propose to deprecate the Hive
{{PerfLogger}} and not simply remove it. I am proposing a new system... a {{PerfTimer}} that
is allows for Java 8's try-with-resources feature to protect against the developer having
to care about manually close measurements and not having to carefully consider all early-exits.
The base implementation logs to SLF4J. An extended version automatically publishes to the
Hive Metric subsystem as well.
> The current Hive {{PerfLogger}} has a bit of a clunky system for allowing plugable implementations.
However, the current default implementation has a side-effect of also publishing timing information
to the Hive Metrics subsystem. There are code sections that look up various timers in the
Metrics Subsytem and publish the results back to the client. Since, in theory, the implementation
is plugable, any other implementation that does not also have this side-effect of also publishing
to the Metrics Subsystem will break these non-optional code paths.  Also, these code paths
create and interact with {{PerfLoggers in a static way, and then the publishing code pulls
the data from the {{PerfLogger}}}} (as a facade to the Metrics subsystem) in a static way.
Therefore, when I tried to replace the entire {{PerfLogger}} code, I came across an issue
because there is not (and should not) be a way to just statically pull this information down
from any point in the code. Information that is required for publishing should be passed around
within some sort of context object, separate from the Metrics subsystem. There was no obvious
way to string a new {{PerfTimer}} to all the required locations. I propose marking the {{PerfLogger}}
as deprecated and leaving these complex section alone. Instead, replace only the simple "I
want a timer" use cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message