metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Elliston Ball <si...@simonellistonball.com>
Subject Re: Writing enrichment data directly from NiFi with PutHBaseJSON
Date Tue, 05 Jun 2018 14:32:30 GMT
Do we, the community, think it would be a good idea to create a
PutMetronEnrichment NiFi processor for this use case? It seems a number of
people want to use NiFi to manage and schedule loading of enrichments for
example.

Simon

On 5 June 2018 at 06:56, Casey Stella <cestella@gmail.com> wrote:

> The problem, as you correctly diagnosed, is the key in HBase.  We construct
> the key very specifically in Metron, so it's unlikely to work out of the
> box with the NiFi processor unfortunately.  The key that we use is formed
> here in the codebase:
> https://github.com/cestella/incubator-metron/blob/master/
> metron-platform/metron-enrichment/src/main/java/org/
> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>
> To put that in english, consider the following:
>
>    - type - The enrichment type
>    - indicator - the indicator to use
>    - hash(*) - A murmur 3 128bit hash function
>
> the key is hash(indicator) + type + indicator
>
> This hash prefixing is a standard practice in hbase key design that allows
> the keys to be uniformly distributed among the regions and prevents
> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
> construct the key and pass it in, then you might be able to either
> construct the key in NiFi or write a processor to construct the key.
> Ultimately though, what Carolyn said is true..the easiest approach is
> probably using the flatfile loader.
> If you do get this working in NiFi, however, do please let us know and/or
> consider contributing it back to the project as a PR :)
>
>
>
> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> Charles.Joynt@gresearch.co.uk>
> wrote:
>
> > Hello,
> >
> > I work as a Dev/Ops Data Engineer within the security team at a company
> in
> > London where we are in the process of implementing Metron. I have been
> > tasked with implementing feeds of network environment data into HBase so
> > that this data can be used as enrichment sources for our security events.
> > First-off I wanted to pull in DNS data for an internal domain.
> >
> > I am assuming that I need to write data into HBase in such a way that it
> > exactly matches what I would get from the flatfile_loader.sh script. A
> > colleague of mine has already loaded some DNS data using that script, so
> I
> > am using that as a reference.
> >
> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > listener and routes it to a PutHBaseJSON processor. The flow is working,
> in
> > the sense that data is successfully written to HBase, but despite
> (naively)
> > specifying "Row Identifier Encoding Strategy = Binary", the results in
> > HBase don't look correct. Comparing the output from HBase scan commands I
> > see:
> >
> > flatfile_loader.sh produced:
> >
> > ROW:
> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\
> x05whois\x00\x0E192.168.0.198
> > CELL: column=data:v, timestamp=1516896203840,
> > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> >
> > PutHBaseJSON produced:
> >
> > ROW:  server.domain.local
> > CELL: column=dns:v, timestamp=1527778603783,
> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> >
> > From source JSON:
> >
> >
> > {"k":"server.domain.local","v":{"name":"server.domain.local"
> ,"type":"A","data":"192.168.0.198"}}
> >
> > I know that there are some differences in column family / field names,
> but
> > my worry is the ROW id. Presumably I need to encode my row key, "k" in
> the
> > JSON data, in a way that matches how the flatfile_loader.sh script did
> it.
> >
> > Can anyone explain how I might convert my Id to the correct format?
> > -or-
> > Does this matter-can Metron use the human-readable ROW ids?
> >
> > Charlie Joynt
> >
> > --------------
> > G-RESEARCH believes the information provided herein is reliable. While
> > every care has been taken to ensure accuracy, the information is
> furnished
> > to the recipients with no warranty as to the completeness and accuracy of
> > its contents and on condition that any errors or omissions shall not be
> > made the basis of any claim, demand or cause of action.
> > The information in this email is intended only for the named recipient.
> > If you are not the intended recipient please notify us immediately and do
> > not copy, distribute or take action based on this e-mail.
> > All messages sent to and from this e-mail address will be logged by
> > G-RESEARCH and are subject to archival storage, monitoring, review and
> > disclosure.
> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > Trenchant Limited is a company registered in England with company number
> > 08127121.
> > --------------
> >
>



-- 
--
simon elliston ball
@sireb

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message