metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carolyn Duby <cd...@hortonworks.com>
Subject RE: Enrich enrichment
Date Mon, 09 Jan 2017 18:59:29 GMT
Adding new topologies adds more processing requirements to the system.  It adds more topics
(storage) and more producers and consumers to kafka (processing).

I think what we need is a dependency of enrichments.  Maybe we need to either derive the dependencies
using the Stellar (potentially not that easy) or allow the enrichment to specify the order
or enrichment calculations.
This will allow users to calculate more enrichment in the same topology.

Thanks
Carolyn




Sent from my Verizon, Samsung Galaxy smartphone


-------- Original message --------
From: Nick Allen <nick@nickallen.org>
Date: 1/9/17 8:49 AM (GMT-08:00)
To: dev@metron.incubator.apache.org
Subject: Re: Enrich enrichment

I agree that making it easy for the user to "enrich enrichments", as Dima
put it, to an arbitrary depth, would be extremely useful for a lot of use
cases. We've discussed the use case a little in the past in this thread
[1].

Re-purposing the "threat intel" phase gives us something that is feasible
today, but only to a "depth" of 2.  We would also need to rename and
redocument it so that users understand how they can leverage the two
phases.  This seems like a minimally viable option if we want to head down
this road.

The other extreme might involve inferring the topology needed based on the
user's configuration. If the user needs 3 phases, then we build a topology
that supports 3 phases.  Under the covers instead of using Flux, we would
use Storm's topology builder Java API to grok the configuration and build
the topology(ies) that the user needs.

I am not sure if we can infer this from the configuration as it exists
today or if we would need to redefine the configuration somehow.  Like I
said this is "extreme", but could give the user more expressive and
intuitive options.




---
[1]
http://mail-archives.apache.org/mod_mbox/incubator-metron-dev/201610.mbox/%3CCAHSJ8NwJUiyp3YO6NVE4tfLoSSkOc6QG%2BMsAJSSDu%2B-wfct_vw%40mail.gmail.com%3E



On Mon, Jan 9, 2017 at 10:56 AM, Casey Stella <cestella@gmail.com> wrote:

> I think that would be a good feature to add to have arbitrary number of
> phases, though it might be tricky to code (the way I envisioned it would
> involve a loop in storm, which is possible[1]), might have unintended
> consequences to guarantees (e.g. updating enrichments might not be able to
> be applied in realtime) and could be tricky to reason about
> performance-wise.
>
> As it stands, the number of phases is a consequence of the topology
> itself.  We do not currently have an architecture which would allow an
> arbitrary number of phases without changing the flux file itself.  What you
> can do, though, in a stellar enrichment is stack enrichments (e.g. depend
> on previous enrichments) because it's just a list of stellar statements.
> The consequence, of course, is that these statements get run within the
> same worker, which is unfortunate, but may be a stopgap workaround.
>
> *1. https://groups.google.com/forum/#!topic/storm-user/EjN1hU58Q_8
>
> On Mon, Jan 9, 2017 at 10:48 AM, Otto Fowler <ottobackwards@gmail.com>
> wrote:
>
> > Maybe the naming of the phases is misleading?  What if you could set up
> an
> > arbitrary number of stages, with defaults?
> >
> >
> > On January 8, 2017 at 16:25:01, Casey Stella (cestella@gmail.com) wrote:
> >
> > You could do the geo enrichment normally and do a stellar hbase
> enrichment
> > in the threat Intel phase.
> >
> > On Sun, Jan 8, 2017 at 16:22 Ryan Merriman <merrimanr@gmail.com> wrote:
> >
> > > Hbase enrichments and geo enrichments are done in parallel so I would
> > not
> > > expect this to work. You could do the Hbase enrichment as a threat
> Intel
> > > enrichment and that should work because enrichments and threat Intel
> are
> > > done in series.
> > >
> > >
> > >
> > > The ideal way would be to chain together Stellar enrichments but I
> don't
> > > think there is a geo enrichment function created yet. I think that
> > should
> > > be a Jira. I know someone is working on an update to how we do geo
> > > enrichments so I will file a follow on Jira if it's not included in the
> > > scope of that work.
> > >
> > >
> > >
> > > Ryan
> > >
> > >
> > >
> > > > On Jan 8, 2017, at 2:31 PM, Dima Kovalyov <Dima.Kovalyov@sstech.us>
> > > wrote:
> > >
> > > >
> > >
> > > > Is it possible to enrich enrichment?
> > >
> > > >
> > >
> > > > For example I have IP address, I enrich it with geo and get City
> name,
> > >
> > > > now I want to enrich City name with city crime level (assume I have
> > that
> > >
> > > > data). But when I do that it just does not work. I specify enrichment
> > >
> > > > like that:
> > >
> > > >> {
> > >
> > > >> "index" : "msexchange",
> > >
> > > >> "batchSize" : 5,
> > >
> > > >> "enrichment" : {
> > >
> > > >> "fieldMap" : {
> > >
> > > >> "geo" : [ "destination_ip", "source_ip" ],
> > >
> > > >> "hbaseEnrichment" : [ "enrichments.geo.destination_ip.country" ],
> > >
> > > >> "hbaseEnrichment" : [ "enrichments:geo:destination_ip:country" ],
> > >
> > > >> "hbaseEnrichment" : [ "enrichments.geo.destination_ip:country" ]
> > >
> > > >> },
> > >
> > > >> "fieldToTypeMap" : {
> > >
> > > >> "enrichments.geo.destination_ip.country" : [ "city_crime_level" ],
> > >
> > > >> "enrichments:geo:destination_ip:country" : [ "city_crime_level" ],
> > >
> > > >> "enrichments.geo.destination_ip:country" : [ "city_crime_level" ]
> > >
> > > >> },
> > >
> > > >> "config" : { }
> > >
> > > >> },
> > >
> > > >> "threatIntel" : {
> > >
> > > >> "fieldMap" : { },
> > >
> > > >> "fieldToTypeMap" : { },
> > >
> > > >> "config" : { },
> > >
> > > >> "triageConfig" : {
> > >
> > > >> "riskLevelRules" : { },
> > >
> > > >> "aggregator" : "MAX",
> > >
> > > >> "aggregationConfig" : { }
> > >
> > > >> }
> > >
> > > >> },
> > >
> > > >> "configuration" : { }
> > >
> > > >> }
> > >
> > > > I tried all the ways how enrichment field can be entered just to be
> > sure
> > >
> > > > I do not mistype it.
> > >
> > > >
> > >
> > > > - Dima
> > >
> > >
> >
> >
>



--
Nick Allen <nick@nickallen.org>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message