metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Allen <n...@nickallen.org>
Subject Re: [DISCUSS] Opinionated Data Flows
Date Fri, 07 Oct 2016 12:12:56 GMT
Whether it is explicit or implicit, I think that would be one of the major
benefits of having the expressiveness of a DSL.  I can choose to have some
enrichments run in parallel (the split/join that you are referring to) or
have some enrichment runs serially.

Having enrichments run serially is not something you can easily do with
Metron today.  You cannot use the output of one enrichment as the input to
another.

As a simple example, I have a blacklist of countries for which my
organization should not be doing business.  I need to use the IP to find
the location and then use the location to match against a blacklist.  I
need these enrichments to run serially.

source("netflow")
  -> parser("Netflow")
  -> exists("ip_src_addr")
  -> src_country = geo["ip_src_addr"].country
  -> is_alert = blacklist["src_country"]
  ...




On Thu, Oct 6, 2016 at 6:25 PM, Matt Foley <mfoley@hortonworks.com> wrote:

> Would splitting and joining be implicit or explicit, for multi-path
> topologies?
> ________________________________________
> From: Zeolla@GMail.com <zeolla@gmail.com>
> Sent: Thursday, October 06, 2016 11:03 AM
> To: dev@metron.incubator.apache.org
> Subject: Re: [DISCUSS] Opinionated Data Flows
>
> It should also be smart enough to handle an order like:
>
> source("bro")
>   -> parser("BasicBroParser")
>   -> exists("ip_src_addr")
>   -> geo_ip_src = geo["ip_src_addr"]
>   -> application = assets["ip_src_addr"].application
>   -> owner = assets["ip_src_addr"].owner
>   -> exists("ip_dst_addr")
>   -> geo_ip_dst = geo["ip_dst_addr"]
>   -> elasticsearch("bro-index")
>
> Without duplicate hits of the topologies.
>
> Jon
>
> On Thu, Oct 6, 2016 at 1:55 PM Nick Allen <nick@nickallen.org> wrote:
>
> > Here is quick example with some hypothetical syntax.  Whatever that
> syntax
> > might be, it would be very simple, easy to understand, and leverage
> > high-level concepts specific to Metron.
> >
> > This flow consumes Bro data, ensures there are valid source/destination
> > IPs, performs geo-enrichment, asset enrichment and finally persists the
> > data in Elasticsearch.
> >
> >
> > source("bro")
> >   -> parser("BasicBroParser")
> >   -> exists("ip_src_addr")
> >   -> exists("ip_dst_addr")
> >   -> geo_ip_src = geo["ip_src_addr"]
> >   -> geo_ip_dst = geo["ip_dst_addr"]
> >   -> application = assets["ip_src_addr"].application
> >   -> owner = assets["ip_src_addr"].owner
> >   -> elasticsearch("bro-index")
> >
> >
> >
> >
> > On Thu, Oct 6, 2016 at 12:58 PM, Nick Allen <nick@nickallen.org> wrote:
> >
> > > Chasing this bad idea down even further leads me to something even
> > > crazier.
> > >
> > > Stellar 1.0 can only operate within a single topology and in most cases
> > > only on a single message.  Stellar 2.0 could be the mechanism that
> allows
> > > users to define their own data flows and what "useful bits of Metron
> > > functionality" get plugged-in.
> > >
> > > Once, you have a DSL that allows users to define what they want Metron
> to
> > > do, then the underlying implementation mechanism (which is currently
> > Storm)
> > > can also be swapped-out.  If we have an even faster Storm
> implementation,
> > > then we swap in the Storm NG engine.  Maybe we want Metron to also run
> in
> > > Flink, then we just swap-in a Flink engine.
> > >
> > >
> > >
> > >
> > > On Thu, Oct 6, 2016 at 12:52 PM, Nick Allen <nick@nickallen.org>
> wrote:
> > >
> > >> I totally "bird dogged the previous thread" as Casey likes to call it.
> > :)
> > >>  I am extracting this thought into a separate thread before I start
> > >> throwing out even more, crazier ideas.
> > >>
> > >> In general, Metron is very opinionated about data flows right now.  We
> > >>> have Parser topologies that feed an Enrichment topology, which then
> > feeds
> > >>> an Indexing topology.  We have useful bits of functionality (think
> > Stellar
> > >>> transforms, Geo enrichment, etc) that are closely coupled with these
> > >>> topologies (aka data flows).
> > >>>
> > >>
> > >>
> > >>> When a user wants to parse heterogenous data from a single topic,
> > that's
> > >>> not easy.  When a user wants enriched output to land in unique topics
> > by
> > >>> sensor type, well, that's also not easy.    When a user wanted to
> skip
> > >>> enrichment of data sources, we actually re-architected the data flow
> > to add
> > >>> the Indexing topology.
> > >>>
> > >>
> > >>
> > >>> In an ideal world, a user should be responsible for defining the data
> > >>> flow, not Metron.  Metron should provide the "useful bits of
> > functionality"
> > >>> that a user can "plugin" wherever they like.  Metron itself should
> not
> > care
> > >>> how the data is moving or what step in the process it is at.
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Nick Allen <nick@nickallen.org>
> > >>
> > >
> > >
> > >
> > > --
> > > Nick Allen <nick@nickallen.org>
> > >
> >
> >
> >
> > --
> > Nick Allen <nick@nickallen.org>
> >
> --
>
> Jon
>



-- 
Nick Allen <nick@nickallen.org>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message