metron-dev mailing list archives

Site index · List index
Message view
Top
From mmiklavc <...@git.apache.org>
Subject [GitHub] incubator-metron pull request #401: METRON-637: Add a STATS_BIN function to ...
Date Tue, 10 Jan 2017 14:00:04 GMT
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/incubator-metron/pull/401#discussion_r95368025

--- Diff: metron-analytics/metron-statistics/src/main/java/org/apache/metron/statistics/StellarStatisticsFunctions.java
---
@@ -425,4 +428,57 @@ public Object apply(List<Object> args) {
return result;
}
}
+
+  /**
+   * Calculates the statistical bin that a value falls in.
+   */
+  @Stellar(namespace = "STATS", name = "BIN"
+          , description = "Computes the bin that the value is in based on the statistical
distribution."
+          , params = {
+          "stats - The Stellar statistics object"
+          , "value - The value to bin"
+          , "bounds? - A list of percentile bin bounds (excluding min and max) or a string
representing a known and common set of bins.  " +
+          "For convenience, we have provided QUARTILE, QUINTILE, and DECILE which you
can pass in as a string arg." +
+          " If this argument is omitted, then we assume a Quartile bin split."
+                    }
+          ,returns = "Which bin N the value falls in such that bound(N-1) < value
<= bound(N). " +
+          "No min and max bounds are provided, so values smaller than the 0'th bound
go in the 0'th bin, " +
+          "and values greater than the last bound go in the M'th bin."
+  )
+  public static class StatsBin extends BaseStellarFunction {
+    public enum BinSplits {
+      QUARTILE(ImmutableList.of(25.0, 50.0, 75.0)),
+      QUINTILE(ImmutableList.of(20.0, 40.0, 60.0, 80.0)),
+      DECILE(ImmutableList.of(10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0))
+      ;
+      public final List<Number> split;
+      BinSplits(List<Number> split) {
+        this.split = split;
+      }
+
+      public static List<Number> getSplit(Object o) {
+        if(o instanceof String) {
+          return BinSplits.valueOf((String)o).split;
+        }
+        else if(o instanceof List) {
+          return ConversionUtils.convert(o, List.class);
+        }
+        throw new IllegalStateException("The split you tried to pass is not a valid split:
" + o.toString());
+      }
+    }
+
+
+    @Override
+    public Object apply(List<Object> args) {
+      StatisticsProvider stats = convert(args.get(0), StatisticsProvider.class);
+      Double value = convert(args.get(1), Double.class);
+      final List<Number> bins = args.size() > 2?BinSplits.getSplit(args.get(2)):BinSplits.QUARTILE.split;
+
+      if (stats == null || value == null || bins.size() == 0) {
+        return -1;
+      }
+      return MathFunctions.Bin.getBin(value, bins.size(), bin -> stats.getPercentile(bins.get(bin).doubleValue()));
--- End diff --

Nice suggestion by Matt. And I like the math bin code reuse and ability to plug in a stats
function provider.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message