metron-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (METRON-392) Allow User to Define Custom 'Group By' for a Profile
Date Tue, 06 Sep 2016 22:24:20 GMT

    [ https://issues.apache.org/jira/browse/METRON-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15468809#comment-15468809
] 

ASF GitHub Bot commented on METRON-392:
---------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-metron/pull/230


> Allow User to Define Custom 'Group By' for a Profile
> ----------------------------------------------------
>
>                 Key: METRON-392
>                 URL: https://issues.apache.org/jira/browse/METRON-392
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Nick Allen
>            Assignee: Nick Allen
>              Labels: profiler
>
> When creating models using Profile data, models are most often going to be trained and
scored not with all of the Profile data, but only subsets or segments of the data.  For example,
Mondays often look very different than Sundays.  When training and scoring a Monday, the model
will only use data from previous Mondays.
> The current Profiler implementation embeds the day of week, week of month, month, and
year in the row key before storing the data in HBase.  This is intended to sort the data to
allow for a contiguous scan when training on subsets of the data.  For example, a read that
should pull in data from Mondays only.
> The problem with this approach is that properly segmenting the data for the specific
problem at hand is as important to building an effective model as feature selection.  Segmenting
on day of week, week of month, etc will not be applicable for many models built by a user.
 
> In addition, there will not be one way in which the data needs to be segmented that applies
for all Profiles.  Each Profile is likely to have different ways in which the data needs to
be segmented.  
> It will also be the case that users will need to segment the data by elements that only
make sense in their specific environment.  For example, a company will have its own holiday
calendar or have specific 'end-of-month' processing days that need to be taken into account.
 A user needs to be able to apply these custom elements in how the data is segmented.
> This change will allow a user to customize as part of a Profile definition how the data
should be grouped when stored in HBase.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message