metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Elliston Ball <si...@simonellistonball.com>
Subject Re: Writing enrichment data directly from NiFi with PutHBaseJSON
Date Wed, 13 Jun 2018 14:16:49 GMT
That’s where something like the Nifi solution would come in... 

With the PutEnrichment processor and a ProcessHttpRequest processor, you do have a web service
for loading enrichments.

We could probably also create a rest service end point for it, which would make some sense,
but there is a nice multi-source, queuing, and lineage element to the nifi solution.

Simon 

> On 13 Jun 2018, at 15:04, Casey Stella <cestella@gmail.com> wrote:
> 
> no, sadly we do not.
> 
>> On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby <cduby@hortonworks.com> wrote:
>> 
>> Agreed….Streaming enrichments is the right solution for DNS data.
>> 
>> Do we have a web service for writing enrichments?
>> 
>> Carolyn Duby
>> Solutions Engineer, Northeast
>> cduby@hortonworks.com
>> +1.508.965.0584
>> 
>> Join my team!
>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
>> Solutions Engineer – Boston - http://grnh.se/8gbxy41
>> Need Answers? Try https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 6/13/18, 6:25 AM, "Charles Joynt" <Charles.Joynt@gresearch.co.uk>
>> wrote:
>> 
>>> Regarding why I didn't choose to load data with the flatfile loader
>> script...
>>> 
>>> I want to be able to SEND enrichment data to Metron rather than have to
>> set up cron jobs to PULL data. At the moment I'm trying to prove that the
>> process works with a simple data source. In the future we will want
>> enrichment data in Metron that comes from systems (e.g. HR databases) that
>> I won't have access to, hence will need someone to be able to send us the
>> data.
>>> 
>>>> Carolyn: just call the flat file loader from a script processor...
>>> 
>>> I didn't believe that would work in my environment. I'm pretty sure the
>> script has dependencies on various Metron JARs, not least for the row id
>> hashing algorithm. I suppose this would require at least a partial install
>> of Metron alongside NiFi, and would introduce additional work on the NiFi
>> cluster for any Metron upgrade. In some (enterprise) environments there
>> might be separation of ownership between NiFi and Metron.
>>> 
>>> I also prefer not to have a Java app calling a bash script which calls a
>> new java process, with logs or error output that might just get swallowed
>> up invisibly. Somewhere down the line this could hold up effective
>> troubleshooting.
>>> 
>>>> Simon: I have actually written a stellar processor, which applies
>> stellar to all FlowFile attributes...
>>> 
>>> Gulp.
>>> 
>>>> Simon: what didn't you like about the flatfile loader script?
>>> 
>>> The flatfile loader script has worked fine for me when prepping
>> enrichment data in test systems, however it was a bit of a chore to get the
>> JSON configuration files set up, especially for "wide" data sources that
>> may have 15-20 fields, e.g. Active Directory.
>>> 
>>> More broadly speaking, I want to embrace the streaming data paradigm and
>> tried to avoid batch jobs. With the DNS example, you might imagine a future
>> where the enrichment data is streamed based on DHCP registrations, DNS
>> update events, etc. In principle this could reduce the window of time where
>> we might enrich a data source with out-of-date data.
>>> 
>>> Charlie
>>> 
>>> -----Original Message-----
>>> From: Carolyn Duby [mailto:cduby@hortonworks.com]
>>> Sent: 12 June 2018 20:33
>>> To: dev@metron.apache.org
>>> Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON
>>> 
>>> I like the streaming enrichment solutions but it depends on how you are
>> getting the data in.  If you get the data in a csv file just call the flat
>> file loader from a script processor.  No special Nifi required.
>>> 
>>> If the enrichments don’t arrive in bulk, the streaming solution is better.
>>> 
>>> Thanks
>>> Carolyn Duby
>>> Solutions Engineer, Northeast
>>> cduby@hortonworks.com
>>> +1.508.965.0584
>>> 
>>> Join my team!
>>> Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
>> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
>> https://community.hortonworks.com <
>> https://community.hortonworks.com/answers/index.html>
>>> 
>>> 
>>> On 6/12/18, 1:08 PM, "Simon Elliston Ball" <simon@simonellistonball.com>
>> wrote:
>>> 
>>>> Good solution. The streaming enrichment writer makes a lot of sense for
>>>> this, especially if you're not using huge enrichment sources that need
>>>> the batch based loaders.
>>>> 
>>>> As it happens I have written most of a NiFi processor to handle this
>>>> use case directly - both non-record and Record based, especially for
>> Otto :).
>>>> The one thing we need to figure out now is where to host that, and how
>>>> to handle releases of a nifi-metron-bundle. I'll probably get round to
>>>> putting the code in my github at least in the next few days, while we
>>>> figure out a more permanent home.
>>>> 
>>>> Charlie, out of curiosity, what didn't you like about the flatfile
>>>> loader script?
>>>> 
>>>> Simon
>>>> 
>>>> On 12 June 2018 at 18:00, Charles Joynt <Charles.Joynt@gresearch.co.uk>
>>>> wrote:
>>>> 
>>>>> Thanks for the responses. I appreciate the willingness to look at
>>>>> creating a NiFi processer. That would be great!
>>>>> 
>>>>> Just to follow up on this (after a week looking after the "ops" side
>>>>> of
>>>>> dev-ops): I really don't want to have to use the flatfile loader
>>>>> script, and I'm not going to be able to write a Metron-style HBase
>>>>> key generator any time soon, but I have had some success with a
>> different approach.
>>>>> 
>>>>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
>>>>> 192.168.0.198"
>>>>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
>>>>> 
>>>>> I then followed your instructions in this blog:
>>>>> https://cwiki.apache.org/confluence/display/METRON/
>>>>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
>>>>> ent
>>>>> 
>>>>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and
>>>>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
>>>>> into HBase:
>>>>> 
>>>>> {
>>>>>        "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
>>>>>        "writerClassName": "org.apache.metron.enrichment.writer.
>>>>> SimpleHbaseEnrichmentWriter",
>>>>>        "sensorTopic": "dns",
>>>>>        "parserConfig": {
>>>>>                "shew.table": " dns",
>>>>>                "shew.cf": "dns",
>>>>>                "shew.keyColumns": "name",
>>>>>                "shew.enrichmentType": "dns",
>>>>>                "columns": {
>>>>>                        "name": 0,
>>>>>                        "type": 1,
>>>>>                        "data": 2
>>>>>                }
>>>>>        },
>>>>> }
>>>>> 
>>>>> And... it seems to be working. At least, I have data in HBase which
>>>>> looks more like the output of the flatfile loader.
>>>>> 
>>>>> Charlie
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Casey Stella [mailto:cestella@gmail.com]
>>>>> Sent: 05 June 2018 14:56
>>>>> To: dev@metron.apache.org
>>>>> Subject: Re: Writing enrichment data directly from NiFi with
>>>>> PutHBaseJSON
>>>>> 
>>>>> The problem, as you correctly diagnosed, is the key in HBase.  We
>>>>> construct the key very specifically in Metron, so it's unlikely to
>>>>> work out of the box with the NiFi processor unfortunately.  The key
>>>>> that we use is formed here in the codebase:
>>>>> https://github.com/cestella/incubator-metron/blob/master/
>>>>> metron-platform/metron-enrichment/src/main/java/org/
>>>>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
>>>>> 
>>>>> To put that in english, consider the following:
>>>>> 
>>>>>   - type - The enrichment type
>>>>>   - indicator - the indicator to use
>>>>>   - hash(*) - A murmur 3 128bit hash function
>>>>> 
>>>>> the key is hash(indicator) + type + indicator
>>>>> 
>>>>> This hash prefixing is a standard practice in hbase key design that
>>>>> allows the keys to be uniformly distributed among the regions and
>>>>> prevents hotspotting.  Depending on how the PutHBaseJSON processor
>>>>> works, if you can construct the key and pass it in, then you might be
>>>>> able to either construct the key in NiFi or write a processor to
>> construct the key.
>>>>> Ultimately though, what Carolyn said is true..the easiest approach is
>>>>> probably using the flatfile loader.
>>>>> If you do get this working in NiFi, however, do please let us know
>>>>> and/or consider contributing it back to the project as a PR :)
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
>>>>> Charles.Joynt@gresearch.co.uk>
>>>>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I work as a Dev/Ops Data Engineer within the security team at a
>>>>>> company in London where we are in the process of implementing Metron.
>>>>>> I have been tasked with implementing feeds of network environment
>>>>>> data into HBase so that this data can be used as enrichment sources
>>>>>> for our
>>>>> security events.
>>>>>> First-off I wanted to pull in DNS data for an internal domain.
>>>>>> 
>>>>>> I am assuming that I need to write data into HBase in such a way
>>>>>> that it exactly matches what I would get from the
>>>>>> flatfile_loader.sh script. A colleague of mine has already loaded
>>>>>> some DNS data using that script, so I am using that as a reference.
>>>>>> 
>>>>>> I have implemented a flow in NiFi which takes JSON data from a HTTP
>>>>>> listener and routes it to a PutHBaseJSON processor. The flow is
>>>>>> working, in the sense that data is successfully written to HBase,
>>>>>> but despite (naively) specifying "Row Identifier Encoding Strategy
>>>>>> = Binary", the results in HBase don't look correct. Comparing the
>>>>>> output from HBase scan commands I
>>>>>> see:
>>>>>> 
>>>>>> flatfile_loader.sh produced:
>>>>>> 
>>>>>> ROW:
>>>>>> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x
>>>>>> 00\
>>>>>> x0E192.168.0.198
>>>>>> CELL: column=data:v, timestamp=1516896203840,
>>>>>> value={"clientname":"server.domain.local","clientip":"192.168.0.198
>>>>>> "}
>>>>>> 
>>>>>> PutHBaseJSON produced:
>>>>>> 
>>>>>> ROW:  server.domain.local
>>>>>> CELL: column=dns:v, timestamp=1527778603783,
>>>>>> value={"name":"server.domain.local","type":"A","data":"192.168.0.19
>>>>>> 8"}
>>>>>> 
>>>>>> From source JSON:
>>>>>> 
>>>>>> 
>>>>>> {"k":"server.domain.local","v":{"name":"server.domain.local","type"
>>>>>> :"A
>>>>>> ","data":"192.168.0.198"}}
>>>>>> 
>>>>>> I know that there are some differences in column family / field
>>>>>> names, but my worry is the ROW id. Presumably I need to encode my
>>>>>> row key, "k" in the JSON data, in a way that matches how the
>>>>>> flatfile_loader.sh
>>>>> script did it.
>>>>>> 
>>>>>> Can anyone explain how I might convert my Id to the correct format?
>>>>>> -or-
>>>>>> Does this matter-can Metron use the human-readable ROW ids?
>>>>>> 
>>>>>> Charlie Joynt
>>>>>> 
>>>>>> --------------
>>>>>> G-RESEARCH believes the information provided herein is reliable.
>>>>>> While every care has been taken to ensure accuracy, the information
>>>>>> is furnished to the recipients with no warranty as to the
>>>>>> completeness and accuracy of its contents and on condition that any
>>>>>> errors or omissions shall not be made the basis of any claim,
>>>>>> demand or cause of
>>>>> action.
>>>>>> The information in this email is intended only for the named
>> recipient.
>>>>>> If you are not the intended recipient please notify us immediately
>>>>>> and do not copy, distribute or take action based on this e-mail.
>>>>>> All messages sent to and from this e-mail address will be logged
by
>>>>>> G-RESEARCH and are subject to archival storage, monitoring, review
>>>>>> and disclosure.
>>>>>> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
>>>>>> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
>>>>>> Trenchant Limited is a company registered in England with company
>>>>>> number 08127121.
>>>>>> --------------
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> --
>>>> simon elliston ball
>>>> @sireb
>> 

Mime
View raw message