metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Elliston Ball <si...@simonellistonball.com>
Subject Re: Writing enrichment data directly from NiFi with PutHBaseJSON
Date Tue, 12 Jun 2018 17:08:43 GMT
Good solution. The streaming enrichment writer makes a lot of sense for
this, especially if you're not using huge enrichment sources that need the
batch based loaders.

As it happens I have written most of a NiFi processor to handle this use
case directly - both non-record and Record based, especially for Otto :).
The one thing we need to figure out now is where to host that, and how to
handle releases of a nifi-metron-bundle. I'll probably get round to putting
the code in my github at least in the next few days, while we figure out a
more permanent home.

Charlie, out of curiosity, what didn't you like about the flatfile loader
script?

Simon

On 12 June 2018 at 18:00, Charles Joynt <Charles.Joynt@gresearch.co.uk>
wrote:

> Thanks for the responses. I appreciate the willingness to look at creating
> a NiFi processer. That would be great!
>
> Just to follow up on this (after a week looking after the "ops" side of
> dev-ops): I really don't want to have to use the flatfile loader script,
> and I'm not going to be able to write a Metron-style HBase key generator
> any time soon, but I have had some success with a different approach.
>
> 1. Generate data in CSV format, e.g. "server.domain.local","A","
> 192.168.0.198"
> 2. Send this to a HTTP listener in NiFi
> 3. Write to a kafka topic
>
> I then followed your instructions in this blog:
> https://cwiki.apache.org/confluence/display/METRON/
> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment
>
> 4. Create a new "dns" sensor in Metron
> 5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig
> settings to push this into HBase:
>
> {
>         "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>         "writerClassName": "org.apache.metron.enrichment.writer.
> SimpleHbaseEnrichmentWriter",
>         "sensorTopic": "dns",
>         "parserConfig": {
>                 "shew.table": " dns",
>                 "shew.cf": "dns",
>                 "shew.keyColumns": "name",
>                 "shew.enrichmentType": "dns",
>                 "columns": {
>                         "name": 0,
>                         "type": 1,
>                         "data": 2
>                 }
>         },
> }
>
> And... it seems to be working. At least, I have data in HBase which looks
> more like the output of the flatfile loader.
>
> Charlie
>
> -----Original Message-----
> From: Casey Stella [mailto:cestella@gmail.com]
> Sent: 05 June 2018 14:56
> To: dev@metron.apache.org
> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>
> The problem, as you correctly diagnosed, is the key in HBase.  We
> construct the key very specifically in Metron, so it's unlikely to work out
> of the box with the NiFi processor unfortunately.  The key that we use is
> formed here in the codebase:
> https://github.com/cestella/incubator-metron/blob/master/
> metron-platform/metron-enrichment/src/main/java/org/
> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>
> To put that in english, consider the following:
>
>    - type - The enrichment type
>    - indicator - the indicator to use
>    - hash(*) - A murmur 3 128bit hash function
>
> the key is hash(indicator) + type + indicator
>
> This hash prefixing is a standard practice in hbase key design that allows
> the keys to be uniformly distributed among the regions and prevents
> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
> construct the key and pass it in, then you might be able to either
> construct the key in NiFi or write a processor to construct the key.
> Ultimately though, what Carolyn said is true..the easiest approach is
> probably using the flatfile loader.
> If you do get this working in NiFi, however, do please let us know and/or
> consider contributing it back to the project as a PR :)
>
>
>
> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> Charles.Joynt@gresearch.co.uk>
> wrote:
>
> > Hello,
> >
> > I work as a Dev/Ops Data Engineer within the security team at a
> > company in London where we are in the process of implementing Metron.
> > I have been tasked with implementing feeds of network environment data
> > into HBase so that this data can be used as enrichment sources for our
> security events.
> > First-off I wanted to pull in DNS data for an internal domain.
> >
> > I am assuming that I need to write data into HBase in such a way that
> > it exactly matches what I would get from the flatfile_loader.sh
> > script. A colleague of mine has already loaded some DNS data using
> > that script, so I am using that as a reference.
> >
> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> > listener and routes it to a PutHBaseJSON processor. The flow is
> > working, in the sense that data is successfully written to HBase, but
> > despite (naively) specifying "Row Identifier Encoding Strategy =
> > Binary", the results in HBase don't look correct. Comparing the output
> > from HBase scan commands I
> > see:
> >
> > flatfile_loader.sh produced:
> >
> > ROW:
> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\
> > x0E192.168.0.198
> > CELL: column=data:v, timestamp=1516896203840,
> > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
> >
> > PutHBaseJSON produced:
> >
> > ROW:  server.domain.local
> > CELL: column=dns:v, timestamp=1527778603783,
> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
> >
> > From source JSON:
> >
> >
> > {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A
> > ","data":"192.168.0.198"}}
> >
> > I know that there are some differences in column family / field names,
> > but my worry is the ROW id. Presumably I need to encode my row key,
> > "k" in the JSON data, in a way that matches how the flatfile_loader.sh
> script did it.
> >
> > Can anyone explain how I might convert my Id to the correct format?
> > -or-
> > Does this matter-can Metron use the human-readable ROW ids?
> >
> > Charlie Joynt
> >
> > --------------
> > G-RESEARCH believes the information provided herein is reliable. While
> > every care has been taken to ensure accuracy, the information is
> > furnished to the recipients with no warranty as to the completeness
> > and accuracy of its contents and on condition that any errors or
> > omissions shall not be made the basis of any claim, demand or cause of
> action.
> > The information in this email is intended only for the named recipient.
> > If you are not the intended recipient please notify us immediately and
> > do not copy, distribute or take action based on this e-mail.
> > All messages sent to and from this e-mail address will be logged by
> > G-RESEARCH and are subject to archival storage, monitoring, review and
> > disclosure.
> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> > Trenchant Limited is a company registered in England with company
> > number 08127121.
> > --------------
> >
>



-- 
--
simon elliston ball
@sireb

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message