metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otto Fowler <ottobackwa...@gmail.com>
Subject Re: Writing enrichment data directly from NiFi with PutHBaseJSON
Date Tue, 05 Jun 2018 17:55:15 GMT
Similar to Bro, we may need to release out of cycle.



On June 5, 2018 at 13:17:55, Simon Elliston Ball (
simon@simonellistonball.com) wrote:

Do you mean in the sense of a separate module, or are you suggesting we go
as far as a sub-project?

On 5 June 2018 at 10:08, Otto Fowler <ottobackwards@gmail.com> wrote:

> If we do that, we should have it as a separate component maybe.
>
>
> On June 5, 2018 at 12:42:57, Simon Elliston Ball (
> simon@simonellistonball.com) wrote:
>
> @otto, well, of course we would use the record api... it's great.
>
> @casey, I have actually written a stellar processor, which applies
stellar
> to all FlowFile attributes outputting the resulting stellar variable
space
> to either attributes or as json in the content.
>
> Is it worth us creating an nifi-metron-bundle. Happy to kick that off,
> since I'm half way there.
>
> Simon
>
>
>
> On 5 June 2018 at 08:41, Otto Fowler <ottobackwards@gmail.com> wrote:
>
> > We have jiras about ‘diverting’ and reading from nifi flows already
> >
> >
> > On June 5, 2018 at 11:11:45, Casey Stella (cestella@gmail.com) wrote:
> >
> > I'd be in strong support of that, Simon. I think we should have some
> other
> > NiFi components in Metron to enable users to interact with our
> > infrastructure from NiFi (e.g. being able to transform via stellar,
> etc).
> >
> > On Tue, Jun 5, 2018 at 10:32 AM Simon Elliston Ball <
> > simon@simonellistonball.com> wrote:
> >
> > > Do we, the community, think it would be a good idea to create a
> > > PutMetronEnrichment NiFi processor for this use case? It seems a
> number
> > of
> > > people want to use NiFi to manage and schedule loading of enrichments
> for
> > > example.
> > >
> > > Simon
> > >
> > > On 5 June 2018 at 06:56, Casey Stella <cestella@gmail.com> wrote:
> > >
> > > > The problem, as you correctly diagnosed, is the key in HBase. We
> > > construct
> > > > the key very specifically in Metron, so it's unlikely to work out
of
> > the
> > > > box with the NiFi processor unfortunately. The key that we use is
> > formed
> > > > here in the codebase:
> > > > https://github.com/cestella/incubator-metron/blob/master/
> > > > metron-platform/metron-enrichment/src/main/java/org/
> > > > apache/metron/enrichment/converter/EnrichmentKey.java#L51
> > > >
> > > > To put that in english, consider the following:
> > > >
> > > > - type - The enrichment type
> > > > - indicator - the indicator to use
> > > > - hash(*) - A murmur 3 128bit hash function
> > > >
> > > > the key is hash(indicator) + type + indicator
> > > >
> > > > This hash prefixing is a standard practice in hbase key design that
> > > allows
> > > > the keys to be uniformly distributed among the regions and prevents
> > > > hotspotting. Depending on how the PutHBaseJSON processor works, if
> you
> > > can
> > > > construct the key and pass it in, then you might be able to either
> > > > construct the key in NiFi or write a processor to construct the
key.
> > > > Ultimately though, what Carolyn said is true..the easiest approach
> is
> > > > probably using the flatfile loader.
> > > > If you do get this working in NiFi, however, do please let us know
> > and/or
> > > > consider contributing it back to the project as a PR :)
> > > >
> > > >
> > > >
> > > > On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> > > > Charles.Joynt@gresearch.co.uk>
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I work as a Dev/Ops Data Engineer within the security team at a
> > company
> > > > in
> > > > > London where we are in the process of implementing Metron. I have
> > been
> > > > > tasked with implementing feeds of network environment data into
> HBase
> > > so
> > > > > that this data can be used as enrichment sources for our security
> > > events.
> > > > > First-off I wanted to pull in DNS data for an internal domain.
> > > > >
> > > > > I am assuming that I need to write data into HBase in such a way
> that
> > > it
> > > > > exactly matches what I would get from the flatfile_loader.sh
> script.
> > A
> > > > > colleague of mine has already loaded some DNS data using that
> script,
> > > so
> > > > I
> > > > > am using that as a reference.
> > > > >
> > > > > I have implemented a flow in NiFi which takes JSON data from a
> HTTP
> > > > > listener and routes it to a PutHBaseJSON processor. The flow is
> > > working,
> > > > in
> > > > > the sense that data is successfully written to HBase, but despite
> > > > (naively)
> > > > > specifying "Row Identifier Encoding Strategy = Binary", the
> results
> > in
> > > > > HBase don't look correct. Comparing the output from HBase scan
> > > commands I
> > > > > see:
> > > > >
> > > > > flatfile_loader.sh produced:
> > > > >
> > > > > ROW:
> > > > > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> > > > x05whois\x00\x0E192.168.0.198
> > > > > CELL: column=data:v, timestamp=1516896203840,
> > > > > value={"clientname":"server.domain.local","clientip":"192.
> > 168.0.198"}
> > > > >
> > > > > PutHBaseJSON produced:
> > > > >
> > > > > ROW: server.domain.local
> > > > > CELL: column=dns:v, timestamp=1527778603783,
> > > > >
> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> > > > >
> > > > > From source JSON:
> > > > >
> > > > >
> > > > > {"k":"server.domain.local","v":{"name":"server.domain.local"
> > > > ,"type":"A","data":"192.168.0.198"}}
> > > > >
> > > > > I know that there are some differences in column family / field
> > names,
> > > > but
> > > > > my worry is the ROW id. Presumably I need to encode my row key,
> "k"
> > in
> > > > the
> > > > > JSON data, in a way that matches how the flatfile_loader.sh
script
> > did
> > > > it.
> > > > >
> > > > > Can anyone explain how I might convert my Id to the correct
> format?
> > > > > -or-
> > > > > Does this matter-can Metron use the human-readable ROW ids?
> > > > >
> > > > > Charlie Joynt
> > > > >
> > > > > --------------
> > > > > G-RESEARCH believes the information provided herein is reliable.
> > While
> > > > > every care has been taken to ensure accuracy, the information is
> > > > furnished
> > > > > to the recipients with no warranty as to the completeness and
> > accuracy
> > > of
> > > > > its contents and on condition that any errors or omissions shall
> not
> > be
> > > > > made the basis of any claim, demand or cause of action.
> > > > > The information in this email is intended only for the named
> > recipient.
> > > > > If you are not the intended recipient please notify us
immediately
> > and
> > > do
> > > > > not copy, distribute or take action based on this e-mail.
> > > > > All messages sent to and from this e-mail address will be logged
> by
> > > > > G-RESEARCH and are subject to archival storage, monitoring,
review
> > and
> > > > > disclosure.
> > > > > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > > > > Whittington House, 19-30 Alfred Place, London WC1E 7EA
> <
https://maps.google.com/?q=19-30+Alfred+Place,+London+WC1E+7EA&entry=gmail&source=g>.

>
> > > > > Trenchant Limited is a company registered in England with company
> > > number
> > > > > 08127121.
> > > > > --------------
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > > simon elliston ball
> > > @sireb
> > >
> >
>
>
>
> --
> --
> simon elliston ball
> @sireb
>
>


-- 
-- 
simon elliston ball
@sireb

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message