hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8541) Better high-percentile latency metrics
Date Tue, 10 Jul 2012 05:51:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410068#comment-13410068

Aaron T. Myers commented on HADOOP-8541:

Hi Andrew, the patch looks pretty good to me. A few comments:

# Need license header in all newly-introduced files.
# There's no need for this sort of code in tests. Throwing an exception from a test case will
cause the test to fail:
+      } catch(IOException e) {
+        e.printStackTrace();
+        fail("Unexpected exception while fetching estimates from estimator.");
+        return;
+      }
# Our coding conventions say to put a single space between conditionals and opening parens,
e.g. "if (...)", and spaces around operators, e.g. "i = 0".
# Why are the instance variables in MutableQuantiles not private? If it's for testing, then
they should be marked with @VisibleForTesting, or made private and exposed via getters/setters
that are marked @VisibleForTesting.
# You should make the inner class RolloverSample static if possible. Given that that class
has a reference to the appropriate MutableQuantiles object, that shouldn't be hard.
# Could go for some class comments in MutableQuantiles and Quantile classes.
# You should put a blank line before instance variable comments, e.g.:
+  private int bufferCount = 0;
+  /**
+   * Array of Quantiles that we care about, along with desired error.
+   */
+  private final Quantile quantiles[];
# Recommend renaming SampleQuantiles#sample to SampleQuantiles#samples.
# You should probably pick a more descriptive name for the "Item" class, and perhaps add some
comments for the class and its instance variables.
# We use lower-camel casing, so rename "lower_delta" to "lowerDelta".
# Rather than keeping the buffer of samples in an array and manually keeping track of the
size of the array, how about just using an ArrayList or the like and calling List#sort?
# You should put the "else" on the same line as the associated closing curly brace.
# Rather than doing a full merge of the contents of the buffer into the samples list, and
then performing a compression operation to remove some samples, is it not possible to somehow
only perform an insert into the samples set if it's within the allowable error bounds?
# The "removed" local variable in SampleQuantiles#compress appears to never be used.
> Better high-percentile latency metrics
> --------------------------------------
>                 Key: HADOOP-8541
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8541
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>    Affects Versions: 2.0.0-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hadoop-8541-1.patch, hadoop-8541-2.patch, hadoop-8541-3.patch
> Based on discussion in HBASE-6261 and with some HDFS devs, I'd like to make better high-percentile
latency metrics a part of hadoop-common.
> I've already got a working implementation of [1], an efficient algorithm for estimating
quantiles on a stream of values. It allows you to specify arbitrary quantiles to track (e.g.
50th, 75th, 90th, 95th, 99th), along with tight error bounds. This estimator can be snapshotted
and reset periodically to get a feel for how these percentiles are changing over time.
> I propose creating a new MutableQuantiles class that does this. [1] isn't completely
without overhead (~1MB memory for reasonably sized windows), which is why I hesitate to add
it to the existing MutableStat class.
> [1] Cormode, Korn, Muthukrishnan, and Srivastava. "Effective Computation of Biased Quantiles
over Data Streams" in ICDE 2005.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message