metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Elliston Ball <si...@simonellistonball.com>
Subject Re: Writing enrichment data directly from NiFi with PutHBaseJSON
Date Tue, 05 Jun 2018 19:31:53 GMT
Also, the bundle would be part of the metron project I expect, so the NiFi release shouldn’t
matter much, now NiFi can version only processors independently.

Simon 

> On 5 Jun 2018, at 20:14, Casey Stella <cestella@gmail.com> wrote:
> 
> I agree with Simon here, the benefit of providing NiFi tooling is to enable NiFi to use
our infrastructure (e.g. our parsers, MaaS, stellar enrichments, etc).  This would tie it
to Metron pretty closely.
> 
>> On Tue, Jun 5, 2018 at 3:12 PM Otto Fowler <ottobackwards@gmail.com> wrote:
>> Nifi releases more often then Metron does, that might be an issue.
>> 
>> 
>> On June 5, 2018 at 14:07:22, Simon Elliston Ball (
>> simon@simonellistonball.com) wrote:
>> 
>> To be honest, I would expect this to be heavily linked to the Metron
>> releases, since it's going to use other metron classes and dependencies to
>> ensure compatibility. For example, a Stellar NiFi processor will be linked
>> to Metron's stellar-common, the enrichment loader will depend on key
>> construction code from metron-enrichment (and should align to it). I was
>> also considering an opinionated PublishMetron which linked to the Metron
>> kafka, and hid some of the dances you have to do to make the readMetadata
>> functions to work (i.e. some sugar around our mild abuse of kafka keys,
>> which prevents people hurting their kafka by choosing the wrong
>> partitioner).
>> 
>> To that extent, I think the releases belong with Metron releases, though of
>> course that does increase our release and test burden.
>> 
>> On 5 June 2018 at 10:55, Otto Fowler <ottobackwards@gmail.com> wrote:
>> 
>> > Similar to Bro, we may need to release out of cycle.
>> >
>> >
>> >
>> > On June 5, 2018 at 13:17:55, Simon Elliston Ball (
>> > simon@simonellistonball.com) wrote:
>> >
>> > Do you mean in the sense of a separate module, or are you suggesting we
>> go
>> > as far as a sub-project?
>> >
>> > On 5 June 2018 at 10:08, Otto Fowler <ottobackwards@gmail.com> wrote:
>> >
>> > > If we do that, we should have it as a separate component maybe.
>> > >
>> > >
>> > > On June 5, 2018 at 12:42:57, Simon Elliston Ball (
>> > > simon@simonellistonball.com) wrote:
>> > >
>> > > @otto, well, of course we would use the record api... it's great.
>> > >
>> > > @casey, I have actually written a stellar processor, which applies
>> > stellar
>> > > to all FlowFile attributes outputting the resulting stellar variable
>> > space
>> > > to either attributes or as json in the content.
>> > >
>> > > Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
>> > > since I'm half way there.
>> > >
>> > > Simon
>> > >
>> > >
>> > >
>> > > On 5 June 2018 at 08:41, Otto Fowler <ottobackwards@gmail.com> wrote:
>> > >
>> > > > We have jiras about ‘diverting’ and reading from nifi flows already
>> > > >
>> > > >
>> > > > On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com) wrote:
>> > > >
>> > > > I'd be in strong support of that, Simon. I think we should have some
>> > > other
>> > > > NiFi components in Metron to enable users to interact with our
>> > > > infrastructure from NiFi (e.g. being able to transform via stellar,
>> > > etc).
>> > > >
>> > > > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
>> > > > simon@simonellistonball.com> wrote:
>> > > >
>> > > > > Do we, the community, think it would be a good idea to create
a
>> > > > > PutMetronEnrichment NiFi processor for this use case? It seems
a
>> > > number
>> > > > of
>> > > > > people want to use NiFi to manage and schedule loading of
>> > enrichments
>> > > for
>> > > > > example.
>> > > > >
>> > > > > Simon
>> > > > >
>> > > > > On 5 June 2018 at 06:56, Casey Stella <cestella@gmail.com>
wrote:
>> > > > >
>> > > > > > The problem, as you correctly diagnosed, is the key in HBase.
We
>> > > > > construct
>> > > > > > the key very specifically in Metron, so it's unlikely to
work out
>> > of
>> > > > the
>> > > > > > box with the NiFi processor unfortunately. The key that
we use is
>> > > > formed
>> > > > > > here in the codebase:
>> > > > > > https://github.com/cestella/incubator-metron/blob/master/
>> > > > > > metron-platform/metron-enrichment/src/main/java/org/
>> > > > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
>> > > > > >
>> > > > > > To put that in english, consider the following:
>> > > > > >
>> > > > > > - type - The enrichment type
>> > > > > > - indicator - the indicator to use
>> > > > > > - hash(*) - A murmur 3 128bit hash function
>> > > > > >
>> > > > > > the key is hash(indicator) + type + indicator
>> > > > > >
>> > > > > > This hash prefixing is a standard practice in hbase key
design
>> > that
>> > > > > allows
>> > > > > > the keys to be uniformly distributed among the regions and
>> > prevents
>> > > > > > hotspotting. Depending on how the PutHBaseJSON processor
works,
>> if
>> > > you
>> > > > > can
>> > > > > > construct the key and pass it in, then you might be able
to
>> either
>> > > > > > construct the key in NiFi or write a processor to construct
the
>> > key.
>> > > > > > Ultimately though, what Carolyn said is true..the easiest
>> approach
>> > > is
>> > > > > > probably using the flatfile loader.
>> > > > > > If you do get this working in NiFi, however, do please let
us
>> know
>> > > > and/or
>> > > > > > consider contributing it back to the project as a PR :)
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
>> > > > > > Charles.Joynt@gresearch.co.uk>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hello,
>> > > > > > >
>> > > > > > > I work as a Dev/Ops Data Engineer within the security
team at a
>> > > > company
>> > > > > > in
>> > > > > > > London where we are in the process of implementing
Metron. I
>> > have
>> > > > been
>> > > > > > > tasked with implementing feeds of network environment
data into
>> > > HBase
>> > > > > so
>> > > > > > > that this data can be used as enrichment sources for
our
>> > security
>> > > > > events.
>> > > > > > > First-off I wanted to pull in DNS data for an internal
domain.
>> > > > > > >
>> > > > > > > I am assuming that I need to write data into HBase
in such a
>> way
>> > > that
>> > > > > it
>> > > > > > > exactly matches what I would get from the flatfile_loader.sh
>> > > script.
>> > > > A
>> > > > > > > colleague of mine has already loaded some DNS data
using that
>> > > script,
>> > > > > so
>> > > > > > I
>> > > > > > > am using that as a reference.
>> > > > > > >
>> > > > > > > I have implemented a flow in NiFi which takes JSON
data from a
>> > > HTTP
>> > > > > > > listener and routes it to a PutHBaseJSON processor.
The flow is
>> > > > > working,
>> > > > > > in
>> > > > > > > the sense that data is successfully written to HBase,
but
>> > despite
>> > > > > > (naively)
>> > > > > > > specifying "Row Identifier Encoding Strategy = Binary",
the
>> > > results
>> > > > in
>> > > > > > > HBase don't look correct. Comparing the output from
HBase scan
>> > > > > commands I
>> > > > > > > see:
>> > > > > > >
>> > > > > > > flatfile_loader.sh produced:
>> > > > > > >
>> > > > > > > ROW:
>> > > > > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
>> > > > > > x05whois\x00\x0E192.168.0.198
>> > > > > > > CELL: column=data:v, timestamp=1516896203840,
>> > > > > > > value={"clientname":"server.domain.local","clientip":"192.
>> > > > 168.0.198"}
>> > > > > > >
>> > > > > > > PutHBaseJSON produced:
>> > > > > > >
>> > > > > > > ROW: server.domain.local
>> > > > > > > CELL: column=dns:v, timestamp=1527778603783,
>> > > > > > >
>> > > >
>> value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>> >
>> > > > > > >
>> > > > > > > From source JSON:
>> > > > > > >
>> > > > > > >
>> > > > > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
>> > > > > > ,"type":"A","data":"192.168.0.198"}}
>> > > > > > >
>> > > > > > > I know that there are some differences in column family
/ field
>> > > > names,
>> > > > > > but
>> > > > > > > my worry is the ROW id. Presumably I need to encode
my row key,
>> > > "k"
>> > > > in
>> > > > > > the
>> > > > > > > JSON data, in a way that matches how the flatfile_loader.sh
>> > script
>> > > > did
>> > > > > > it.
>> > > > > > >
>> > > > > > > Can anyone explain how I might convert my Id to the
correct
>> > > format?
>> > > > > > > -or-
>> > > > > > > Does this matter-can Metron use the human-readable
ROW ids?
>> > > > > > >
>> > > > > > > Charlie Joynt
>> > > > > > >
>> > > > > > > --------------
>> > > > > > > G-RESEARCH believes the information provided herein
is
>> reliable.
>> > > > While
>> > > > > > > every care has been taken to ensure accuracy, the information
>> is
>> > > > > > furnished
>> > > > > > > to the recipients with no warranty as to the completeness
and
>> > > > accuracy
>> > > > > of
>> > > > > > > its contents and on condition that any errors or omissions
>> shall
>> > > not
>> > > > be
>> > > > > > > made the basis of any claim, demand or cause of action.
>> > > > > > > The information in this email is intended only for
the named
>> > > > recipient.
>> > > > > > > If you are not the intended recipient please notify
us
>> > immediately
>> > > > and
>> > > > > do
>> > > > > > > not copy, distribute or take action based on this e-mail.
>> > > > > > > All messages sent to and from this e-mail address will
be
>> logged
>> > > by
>> > > > > > > G-RESEARCH and are subject to archival storage, monitoring,
>> > review
>> > > > and
>> > > > > > > disclosure.
>> > > > > > > G-RESEARCH is the trading name of Trenchant Limited,
5th Floor,
>> > > > > > > Whittington House, 19-30 Alfred Place, London WC1E
7EA
>> > <
>> https://maps.google.com/?q=19-30+Alfred+Place,+London+WC1E+7EA&entry=gmail&source=g>
>> 
>> > > <https://maps.google.com/?q=19-30+Alfred+Place,+London+
>> > WC1E+7EA&entry=gmail&source=g>.
>> > >
>> > > > > > > Trenchant Limited is a company registered in England
with
>> > company
>> > > > > number
>> > > > > > > 08127121.
>> > > > > > > --------------
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > --
>> > > > > simon elliston ball
>> > > > > @sireb
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > --
>> > > simon elliston ball
>> > > @sireb
>> > >
>> > >
>> >
>> >
>> > --
>> > --
>> > simon elliston ball
>> > @sireb
>> >
>> >
>> 
>> 
>> -- 
>> -- 
>> simon elliston ball
>> @sireb

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message