metron-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Justin Leet (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (METRON-1699) Create Batch Profiler
Date Tue, 11 Dec 2018 16:13:00 GMT

     [ https://issues.apache.org/jira/browse/METRON-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Justin Leet updated METRON-1699:
--------------------------------
    Fix Version/s:     (was: Next + 1)
                   0.7.0

> Create Batch Profiler
> ---------------------
>
>                 Key: METRON-1699
>                 URL: https://issues.apache.org/jira/browse/METRON-1699
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Nick Allen
>            Assignee: Nick Allen
>            Priority: Major
>             Fix For: 0.7.0
>
>         Attachments: Screen Shot 2018-07-27 at 10.55.27 AM.png, Screen Shot 2018-07-27
at 11.07.33 AM.png, Screen Shot 2018-07-27 at 11.10.16 AM.png
>
>
> Create a Batch Profiler that satisfies the following use cases.
> h3. Use Cases
>  * As a Security Data Scientist, I want to understand the historical behaviors and trends
of a profile that I have created so that I can determine if I have created a feature set that
has predictive value for model building.
>  * As a Security Data Scientist, I want to understand the historical behaviors and trends
of a profile that I have created so that I can determine if I have defined the profile correctly
and created a feature set that matches reality.
>  * As a Security Platform Engineer, I want to generate a profile using archived telemetry
when I deploy a new model to production so that models depending on that profile can function
on day 1.
> h3. Goal
>  * Currently, a profile can only be generated from the telemetry consumed *after* the
profile was created.
>  * The goal would be to enable “profile seeding” which allows profiles to be populated
from a time *before* the profile was created.
>  * A profile would be seeded using the telemetry that has been archived by Metron in
HDFS.
>  * A profile consumer should not be able to distinguish the “seeded” portion of a
profile.
> !Screen Shot 2018-07-27 at 10.55.27 AM.png!
> h3. Current State
>  * There are currently two ports of the Profiler; the Streaming Profiler that handles
streaming data in Storm and the other that runs in the REPL and allows a user to manually
build, test, and debug profiles.
>  * These ports largely share a common code base in metron-analytics/metron-profiler-common.
>  * A smaller set of “orchestration” logic is required to maintain each port; one
for Storm, another for the REPL.
>  * Both Profiler ports supports both system time and event time processing.
> !Screen Shot 2018-07-27 at 11.07.33 AM.png!
> h3. Approach
>  * Create a third port of the Profiler; the Batch Profiler.
>  * The Batch Profiler will be built to run in Spark so that the telemetry can be consumed
in batch.
>  * Allows a user to seed profiles using the JSON telemetry that is archived in HDFS by
Metron Indexing.
>  * Only generates the profile data stored in HBase, not the messages that are produced
for Threat Triage and Kafka.
>  * Any number of profiles can be generated at once, but no dependencies between the profiles
are supported. A dependency is where one profile is a consumer of the profile generated by
another.
>  * The Batch Profiler must use the timestamps contained within the telemetry; it runs
on event time. Luckily the Profiler already supports event time.
>  * Enable a pluggable mechanism so that telemetry stored in different formats can be
consumed by the Batch Profiler. For example, the Profiler should be able to consume telemetry
stored as raw JSON or in other formats like ORC or Parquet.  
> !Screen Shot 2018-07-27 at 11.10.16 AM.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message