metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carolyn Duby <cd...@hortonworks.com>
Subject Re: Writing enrichment data directly from NiFi with PutHBaseJSON
Date Tue, 12 Jun 2018 19:33:13 GMT
I like the streaming enrichment solutions but it depends on how you are getting the data in.
 If you get the data in a csv file just call the flat file loader from a script processor.
 No special Nifi required.

If the enrichments don’t arrive in bulk, the streaming solution is better.

Thanks
Carolyn Duby
Solutions Engineer, Northeast
cduby@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html>








On 6/12/18, 1:08 PM, "Simon Elliston Ball" <simon@simonellistonball.com> wrote:

>Good solution. The streaming enrichment writer makes a lot of sense for
>this, especially if you're not using huge enrichment sources that need the
>batch based loaders.
>
>As it happens I have written most of a NiFi processor to handle this use
>case directly - both non-record and Record based, especially for Otto :).
>The one thing we need to figure out now is where to host that, and how to
>handle releases of a nifi-metron-bundle. I'll probably get round to putting
>the code in my github at least in the next few days, while we figure out a
>more permanent home.
>
>Charlie, out of curiosity, what didn't you like about the flatfile loader
>script?
>
>Simon
>
>On 12 June 2018 at 18:00, Charles Joynt <Charles.Joynt@gresearch.co.uk>
>wrote:
>
>> Thanks for the responses. I appreciate the willingness to look at creating
>> a NiFi processer. That would be great!
>>
>> Just to follow up on this (after a week looking after the "ops" side of
>> dev-ops): I really don't want to have to use the flatfile loader script,
>> and I'm not going to be able to write a Metron-style HBase key generator
>> any time soon, but I have had some success with a different approach.
>>
>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>> 192.168.0.198"
>> 2. Send this to a HTTP listener in NiFi
>> 3. Write to a kafka topic
>>
>> I then followed your instructions in this blog:
>> https://cwiki.apache.org/confluence/display/METRON/
>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment
>>
>> 4. Create a new "dns" sensor in Metron
>> 5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig
>> settings to push this into HBase:
>>
>> {
>>         "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>>         "writerClassName": "org.apache.metron.enrichment.writer.
>> SimpleHbaseEnrichmentWriter",
>>         "sensorTopic": "dns",
>>         "parserConfig": {
>>                 "shew.table": " dns",
>>                 "shew.cf": "dns",
>>                 "shew.keyColumns": "name",
>>                 "shew.enrichmentType": "dns",
>>                 "columns": {
>>                         "name": 0,
>>                         "type": 1,
>>                         "data": 2
>>                 }
>>         },
>> }
>>
>> And... it seems to be working. At least, I have data in HBase which looks
>> more like the output of the flatfile loader.
>>
>> Charlie
>>
>> -----Original Message-----
>> From: Casey Stella [mailto:cestella@gmail.com]
>> Sent: 05 June 2018 14:56
>> To: dev@metron.apache.org
>> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>>
>> The problem, as you correctly diagnosed, is the key in HBase.  We
>> construct the key very specifically in Metron, so it's unlikely to work out
>> of the box with the NiFi processor unfortunately.  The key that we use is
>> formed here in the codebase:
>> https://github.com/cestella/incubator-metron/blob/master/
>> metron-platform/metron-enrichment/src/main/java/org/
>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>
>> To put that in english, consider the following:
>>
>>    - type - The enrichment type
>>    - indicator - the indicator to use
>>    - hash(*) - A murmur 3 128bit hash function
>>
>> the key is hash(indicator) + type + indicator
>>
>> This hash prefixing is a standard practice in hbase key design that allows
>> the keys to be uniformly distributed among the regions and prevents
>> hotspotting.  Depending on how the PutHBaseJSON processor works, if you can
>> construct the key and pass it in, then you might be able to either
>> construct the key in NiFi or write a processor to construct the key.
>> Ultimately though, what Carolyn said is true..the easiest approach is
>> probably using the flatfile loader.
>> If you do get this working in NiFi, however, do please let us know and/or
>> consider contributing it back to the project as a PR :)
>>
>>
>>
>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
>> Charles.Joynt@gresearch.co.uk>
>> wrote:
>>
>> > Hello,
>> >
>> > I work as a Dev/Ops Data Engineer within the security team at a
>> > company in London where we are in the process of implementing Metron.
>> > I have been tasked with implementing feeds of network environment data
>> > into HBase so that this data can be used as enrichment sources for our
>> security events.
>> > First-off I wanted to pull in DNS data for an internal domain.
>> >
>> > I am assuming that I need to write data into HBase in such a way that
>> > it exactly matches what I would get from the flatfile_loader.sh
>> > script. A colleague of mine has already loaded some DNS data using
>> > that script, so I am using that as a reference.
>> >
>> > I have implemented a flow in NiFi which takes JSON data from a HTTP
>> > listener and routes it to a PutHBaseJSON processor. The flow is
>> > working, in the sense that data is successfully written to HBase, but
>> > despite (naively) specifying "Row Identifier Encoding Strategy =
>> > Binary", the results in HBase don't look correct. Comparing the output
>> > from HBase scan commands I
>> > see:
>> >
>> > flatfile_loader.sh produced:
>> >
>> > ROW:
>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\
>> > x0E192.168.0.198
>> > CELL: column=data:v, timestamp=1516896203840,
>> > value={"clientname":"server.domain.local","clientip":"192.168.0.198"}
>> >
>> > PutHBaseJSON produced:
>> >
>> > ROW:  server.domain.local
>> > CELL: column=dns:v, timestamp=1527778603783,
>> > value={"name":"server.domain.local","type":"A","data":"192.168.0.198"}
>> >
>> > From source JSON:
>> >
>> >
>> > {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A
>> > ","data":"192.168.0.198"}}
>> >
>> > I know that there are some differences in column family / field names,
>> > but my worry is the ROW id. Presumably I need to encode my row key,
>> > "k" in the JSON data, in a way that matches how the flatfile_loader.sh
>> script did it.
>> >
>> > Can anyone explain how I might convert my Id to the correct format?
>> > -or-
>> > Does this matter-can Metron use the human-readable ROW ids?
>> >
>> > Charlie Joynt
>> >
>> > --------------
>> > G-RESEARCH believes the information provided herein is reliable. While
>> > every care has been taken to ensure accuracy, the information is
>> > furnished to the recipients with no warranty as to the completeness
>> > and accuracy of its contents and on condition that any errors or
>> > omissions shall not be made the basis of any claim, demand or cause of
>> action.
>> > The information in this email is intended only for the named recipient.
>> > If you are not the intended recipient please notify us immediately and
>> > do not copy, distribute or take action based on this e-mail.
>> > All messages sent to and from this e-mail address will be logged by
>> > G-RESEARCH and are subject to archival storage, monitoring, review and
>> > disclosure.
>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>> > Trenchant Limited is a company registered in England with company
>> > number 08127121.
>> > --------------
>> >
>>
>
>
>
>-- 
>--
>simon elliston ball
>@sireb
Mime
View raw message