metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carolyn Duby <cd...@hortonworks.com>
Subject Re: Writing enrichment data directly from NiFi with PutHBaseJSON
Date Wed, 13 Jun 2018 14:01:34 GMT
Agreed….Streaming enrichments is the right solution for DNS data.

Do we have a web service for writing enrichments?

Carolyn Duby
Solutions Engineer, Northeast
cduby@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html>








On 6/13/18, 6:25 AM, "Charles Joynt" <Charles.Joynt@gresearch.co.uk> wrote:

>Regarding why I didn't choose to load data with the flatfile loader script...
>
>I want to be able to SEND enrichment data to Metron rather than have to set up cron jobs
to PULL data. At the moment I'm trying to prove that the process works with a simple data
source. In the future we will want enrichment data in Metron that comes from systems (e.g.
HR databases) that I won't have access to, hence will need someone to be able to send us the
data.
>
>> Carolyn: just call the flat file loader from a script processor...
>
>I didn't believe that would work in my environment. I'm pretty sure the script has dependencies
on various Metron JARs, not least for the row id hashing algorithm. I suppose this would require
at least a partial install of Metron alongside NiFi, and would introduce additional work on
the NiFi cluster for any Metron upgrade. In some (enterprise) environments there might be
separation of ownership between NiFi and Metron.
>
>I also prefer not to have a Java app calling a bash script which calls a new java process,
with logs or error output that might just get swallowed up invisibly. Somewhere down the line
this could hold up effective troubleshooting.
>
>> Simon: I have actually written a stellar processor, which applies stellar to all
FlowFile attributes...
>
>Gulp.
>
>> Simon: what didn't you like about the flatfile loader script?
>
>The flatfile loader script has worked fine for me when prepping enrichment data in test
systems, however it was a bit of a chore to get the JSON configuration files set up, especially
for "wide" data sources that may have 15-20 fields, e.g. Active Directory.
>
>More broadly speaking, I want to embrace the streaming data paradigm and tried to avoid
batch jobs. With the DNS example, you might imagine a future where the enrichment data is
streamed based on DHCP registrations, DNS update events, etc. In principle this could reduce
the window of time where we might enrich a data source with out-of-date data.
>
>Charlie
>
>-----Original Message-----
>From: Carolyn Duby [mailto:cduby@hortonworks.com] 
>Sent: 12 June 2018 20:33
>To: dev@metron.apache.org
>Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>
>I like the streaming enrichment solutions but it depends on how you are getting the data
in.  If you get the data in a csv file just call the flat file loader from a script processor.
 No special Nifi required.
>
>If the enrichments don’t arrive in bulk, the streaming solution is better.
>
>Thanks
>Carolyn Duby
>Solutions Engineer, Northeast
>cduby@hortonworks.com
>+1.508.965.0584
>
>Join my team!
>Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions Engineer –
Boston - http://grnh.se/8gbxy41 Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html>
>
>
>On 6/12/18, 1:08 PM, "Simon Elliston Ball" <simon@simonellistonball.com> wrote:
>
>>Good solution. The streaming enrichment writer makes a lot of sense for 
>>this, especially if you're not using huge enrichment sources that need 
>>the batch based loaders.
>>
>>As it happens I have written most of a NiFi processor to handle this 
>>use case directly - both non-record and Record based, especially for Otto :).
>>The one thing we need to figure out now is where to host that, and how 
>>to handle releases of a nifi-metron-bundle. I'll probably get round to 
>>putting the code in my github at least in the next few days, while we 
>>figure out a more permanent home.
>>
>>Charlie, out of curiosity, what didn't you like about the flatfile 
>>loader script?
>>
>>Simon
>>
>>On 12 June 2018 at 18:00, Charles Joynt <Charles.Joynt@gresearch.co.uk>
>>wrote:
>>
>>> Thanks for the responses. I appreciate the willingness to look at 
>>> creating a NiFi processer. That would be great!
>>>
>>> Just to follow up on this (after a week looking after the "ops" side 
>>> of
>>> dev-ops): I really don't want to have to use the flatfile loader 
>>> script, and I'm not going to be able to write a Metron-style HBase 
>>> key generator any time soon, but I have had some success with a different approach.
>>>
>>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>>> 192.168.0.198"
>>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
>>>
>>> I then followed your instructions in this blog:
>>> https://cwiki.apache.org/confluence/display/METRON/
>>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
>>> ent
>>>
>>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and 
>>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this 
>>> into HBase:
>>>
>>> {
>>>         "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>>>         "writerClassName": "org.apache.metron.enrichment.writer.
>>> SimpleHbaseEnrichmentWriter",
>>>         "sensorTopic": "dns",
>>>         "parserConfig": {
>>>                 "shew.table": " dns",
>>>                 "shew.cf": "dns",
>>>                 "shew.keyColumns": "name",
>>>                 "shew.enrichmentType": "dns",
>>>                 "columns": {
>>>                         "name": 0,
>>>                         "type": 1,
>>>                         "data": 2
>>>                 }
>>>         },
>>> }
>>>
>>> And... it seems to be working. At least, I have data in HBase which 
>>> looks more like the output of the flatfile loader.
>>>
>>> Charlie
>>>
>>> -----Original Message-----
>>> From: Casey Stella [mailto:cestella@gmail.com]
>>> Sent: 05 June 2018 14:56
>>> To: dev@metron.apache.org
>>> Subject: Re: Writing enrichment data directly from NiFi with 
>>> PutHBaseJSON
>>>
>>> The problem, as you correctly diagnosed, is the key in HBase.  We 
>>> construct the key very specifically in Metron, so it's unlikely to 
>>> work out of the box with the NiFi processor unfortunately.  The key 
>>> that we use is formed here in the codebase:
>>> https://github.com/cestella/incubator-metron/blob/master/
>>> metron-platform/metron-enrichment/src/main/java/org/
>>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>>
>>> To put that in english, consider the following:
>>>
>>>    - type - The enrichment type
>>>    - indicator - the indicator to use
>>>    - hash(*) - A murmur 3 128bit hash function
>>>
>>> the key is hash(indicator) + type + indicator
>>>
>>> This hash prefixing is a standard practice in hbase key design that 
>>> allows the keys to be uniformly distributed among the regions and 
>>> prevents hotspotting.  Depending on how the PutHBaseJSON processor 
>>> works, if you can construct the key and pass it in, then you might be 
>>> able to either construct the key in NiFi or write a processor to construct the
key.
>>> Ultimately though, what Carolyn said is true..the easiest approach is 
>>> probably using the flatfile loader.
>>> If you do get this working in NiFi, however, do please let us know 
>>> and/or consider contributing it back to the project as a PR :)
>>>
>>>
>>>
>>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt < 
>>> Charles.Joynt@gresearch.co.uk>
>>> wrote:
>>>
>>> > Hello,
>>> >
>>> > I work as a Dev/Ops Data Engineer within the security team at a 
>>> > company in London where we are in the process of implementing Metron.
>>> > I have been tasked with implementing feeds of network environment 
>>> > data into HBase so that this data can be used as enrichment sources 
>>> > for our
>>> security events.
>>> > First-off I wanted to pull in DNS data for an internal domain.
>>> >
>>> > I am assuming that I need to write data into HBase in such a way 
>>> > that it exactly matches what I would get from the 
>>> > flatfile_loader.sh script. A colleague of mine has already loaded 
>>> > some DNS data using that script, so I am using that as a reference.
>>> >
>>> > I have implemented a flow in NiFi which takes JSON data from a HTTP 
>>> > listener and routes it to a PutHBaseJSON processor. The flow is 
>>> > working, in the sense that data is successfully written to HBase, 
>>> > but despite (naively) specifying "Row Identifier Encoding Strategy 
>>> > = Binary", the results in HBase don't look correct. Comparing the 
>>> > output from HBase scan commands I
>>> > see:
>>> >
>>> > flatfile_loader.sh produced:
>>> >
>>> > ROW:
>>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x
>>> > 00\
>>> > x0E192.168.0.198
>>> > CELL: column=data:v, timestamp=1516896203840, 
>>> > value={"clientname":"server.domain.local","clientip":"192.168.0.198
>>> > "}
>>> >
>>> > PutHBaseJSON produced:
>>> >
>>> > ROW:  server.domain.local
>>> > CELL: column=dns:v, timestamp=1527778603783, 
>>> > value={"name":"server.domain.local","type":"A","data":"192.168.0.19
>>> > 8"}
>>> >
>>> > From source JSON:
>>> >
>>> >
>>> > {"k":"server.domain.local","v":{"name":"server.domain.local","type"
>>> > :"A
>>> > ","data":"192.168.0.198"}}
>>> >
>>> > I know that there are some differences in column family / field 
>>> > names, but my worry is the ROW id. Presumably I need to encode my 
>>> > row key, "k" in the JSON data, in a way that matches how the 
>>> > flatfile_loader.sh
>>> script did it.
>>> >
>>> > Can anyone explain how I might convert my Id to the correct format?
>>> > -or-
>>> > Does this matter-can Metron use the human-readable ROW ids?
>>> >
>>> > Charlie Joynt
>>> >
>>> > --------------
>>> > G-RESEARCH believes the information provided herein is reliable. 
>>> > While every care has been taken to ensure accuracy, the information 
>>> > is furnished to the recipients with no warranty as to the 
>>> > completeness and accuracy of its contents and on condition that any 
>>> > errors or omissions shall not be made the basis of any claim, 
>>> > demand or cause of
>>> action.
>>> > The information in this email is intended only for the named recipient.
>>> > If you are not the intended recipient please notify us immediately 
>>> > and do not copy, distribute or take action based on this e-mail.
>>> > All messages sent to and from this e-mail address will be logged by 
>>> > G-RESEARCH and are subject to archival storage, monitoring, review 
>>> > and disclosure.
>>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, 
>>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>>> > Trenchant Limited is a company registered in England with company 
>>> > number 08127121.
>>> > --------------
>>> >
>>>
>>
>>
>>
>>--
>>--
>>simon elliston ball
>>@sireb
Mime
View raw message