metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Foley <mfo...@hortonworks.com>
Subject Re: [DISCUSS] Opinionated Data Flows
Date Thu, 06 Oct 2016 22:25:46 GMT
Would splitting and joining be implicit or explicit, for multi-path topologies?
________________________________________
From: Zeolla@GMail.com <zeolla@gmail.com>
Sent: Thursday, October 06, 2016 11:03 AM
To: dev@metron.incubator.apache.org
Subject: Re: [DISCUSS] Opinionated Data Flows

It should also be smart enough to handle an order like:

source("bro")
  -> parser("BasicBroParser")
  -> exists("ip_src_addr")
  -> geo_ip_src = geo["ip_src_addr"]
  -> application = assets["ip_src_addr"].application
  -> owner = assets["ip_src_addr"].owner
  -> exists("ip_dst_addr")
  -> geo_ip_dst = geo["ip_dst_addr"]
  -> elasticsearch("bro-index")

Without duplicate hits of the topologies.

Jon

On Thu, Oct 6, 2016 at 1:55 PM Nick Allen <nick@nickallen.org> wrote:

> Here is quick example with some hypothetical syntax.  Whatever that syntax
> might be, it would be very simple, easy to understand, and leverage
> high-level concepts specific to Metron.
>
> This flow consumes Bro data, ensures there are valid source/destination
> IPs, performs geo-enrichment, asset enrichment and finally persists the
> data in Elasticsearch.
>
>
> source("bro")
>   -> parser("BasicBroParser")
>   -> exists("ip_src_addr")
>   -> exists("ip_dst_addr")
>   -> geo_ip_src = geo["ip_src_addr"]
>   -> geo_ip_dst = geo["ip_dst_addr"]
>   -> application = assets["ip_src_addr"].application
>   -> owner = assets["ip_src_addr"].owner
>   -> elasticsearch("bro-index")
>
>
>
>
> On Thu, Oct 6, 2016 at 12:58 PM, Nick Allen <nick@nickallen.org> wrote:
>
> > Chasing this bad idea down even further leads me to something even
> > crazier.
> >
> > Stellar 1.0 can only operate within a single topology and in most cases
> > only on a single message.  Stellar 2.0 could be the mechanism that allows
> > users to define their own data flows and what "useful bits of Metron
> > functionality" get plugged-in.
> >
> > Once, you have a DSL that allows users to define what they want Metron to
> > do, then the underlying implementation mechanism (which is currently
> Storm)
> > can also be swapped-out.  If we have an even faster Storm implementation,
> > then we swap in the Storm NG engine.  Maybe we want Metron to also run in
> > Flink, then we just swap-in a Flink engine.
> >
> >
> >
> >
> > On Thu, Oct 6, 2016 at 12:52 PM, Nick Allen <nick@nickallen.org> wrote:
> >
> >> I totally "bird dogged the previous thread" as Casey likes to call it.
> :)
> >>  I am extracting this thought into a separate thread before I start
> >> throwing out even more, crazier ideas.
> >>
> >> In general, Metron is very opinionated about data flows right now.  We
> >>> have Parser topologies that feed an Enrichment topology, which then
> feeds
> >>> an Indexing topology.  We have useful bits of functionality (think
> Stellar
> >>> transforms, Geo enrichment, etc) that are closely coupled with these
> >>> topologies (aka data flows).
> >>>
> >>
> >>
> >>> When a user wants to parse heterogenous data from a single topic,
> that's
> >>> not easy.  When a user wants enriched output to land in unique topics
> by
> >>> sensor type, well, that's also not easy.    When a user wanted to skip
> >>> enrichment of data sources, we actually re-architected the data flow
> to add
> >>> the Indexing topology.
> >>>
> >>
> >>
> >>> In an ideal world, a user should be responsible for defining the data
> >>> flow, not Metron.  Metron should provide the "useful bits of
> functionality"
> >>> that a user can "plugin" wherever they like.  Metron itself should not
> care
> >>> how the data is moving or what step in the process it is at.
> >>
> >>
> >>
> >>
> >> --
> >> Nick Allen <nick@nickallen.org>
> >>
> >
> >
> >
> > --
> > Nick Allen <nick@nickallen.org>
> >
>
>
>
> --
> Nick Allen <nick@nickallen.org>
>
--

Jon

Mime
View raw message