metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otto Fowler <ottobackwa...@gmail.com>
Subject Re: Writing enrichment data directly from NiFi with PutHBaseJSON
Date Wed, 13 Jun 2018 14:24:40 GMT
Do we even have a jira?  If not maybe Carolyn et. al. can write one up that
lays out some
requirements and context.


On June 13, 2018 at 10:04:27, Casey Stella (cestella@gmail.com) wrote:

no, sadly we do not.

On Wed, Jun 13, 2018 at 10:01 AM Carolyn Duby <cduby@hortonworks.com>
wrote:

> Agreed….Streaming enrichments is the right solution for DNS data.
>
> Do we have a web service for writing enrichments?
>
> Carolyn Duby
> Solutions Engineer, Northeast
> cduby@hortonworks.com
> +1.508.965.0584
>
> Join my team!
> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
> Solutions Engineer – Boston - http://grnh.se/8gbxy41
> Need Answers? Try https://community.hortonworks.com <
> https://community.hortonworks.com/answers/index.html>
>
>
>
>
>
>
>
>
> On 6/13/18, 6:25 AM, "Charles Joynt" <Charles.Joynt@gresearch.co.uk>
> wrote:
>
> >Regarding why I didn't choose to load data with the flatfile loader
> script...
> >
> >I want to be able to SEND enrichment data to Metron rather than have to
> set up cron jobs to PULL data. At the moment I'm trying to prove that the
> process works with a simple data source. In the future we will want
> enrichment data in Metron that comes from systems (e.g. HR databases)
that
> I won't have access to, hence will need someone to be able to send us the
> data.
> >
> >> Carolyn: just call the flat file loader from a script processor...
> >
> >I didn't believe that would work in my environment. I'm pretty sure the
> script has dependencies on various Metron JARs, not least for the row id
> hashing algorithm. I suppose this would require at least a partial
install
> of Metron alongside NiFi, and would introduce additional work on the NiFi
> cluster for any Metron upgrade. In some (enterprise) environments there
> might be separation of ownership between NiFi and Metron.
> >
> >I also prefer not to have a Java app calling a bash script which calls a
> new java process, with logs or error output that might just get swallowed
> up invisibly. Somewhere down the line this could hold up effective
> troubleshooting.
> >
> >> Simon: I have actually written a stellar processor, which applies
> stellar to all FlowFile attributes...
> >
> >Gulp.
> >
> >> Simon: what didn't you like about the flatfile loader script?
> >
> >The flatfile loader script has worked fine for me when prepping
> enrichment data in test systems, however it was a bit of a chore to get
the
> JSON configuration files set up, especially for "wide" data sources that
> may have 15-20 fields, e.g. Active Directory.
> >
> >More broadly speaking, I want to embrace the streaming data paradigm and
> tried to avoid batch jobs. With the DNS example, you might imagine a
future
> where the enrichment data is streamed based on DHCP registrations, DNS
> update events, etc. In principle this could reduce the window of time
where
> we might enrich a data source with out-of-date data.
> >
> >Charlie
> >
> >-----Original Message-----
> >From: Carolyn Duby [mailto:cduby@hortonworks.com]
> >Sent: 12 June 2018 20:33
> >To: dev@metron.apache.org
> >Subject: Re: Writing enrichment data directly from NiFi with
PutHBaseJSON
> >
> >I like the streaming enrichment solutions but it depends on how you are
> getting the data in. If you get the data in a csv file just call the flat
> file loader from a script processor. No special Nifi required.
> >
> >If the enrichments don’t arrive in bulk, the streaming solution is
better.
> >
> >Thanks
> >Carolyn Duby
> >Solutions Engineer, Northeast
> >cduby@hortonworks.com
> >+1.508.965.0584
> >
> >Join my team!
> >Enterprise Account Manager – Boston - http://grnh.se/wepchv1 Solutions
> Engineer – Boston - http://grnh.se/8gbxy41 Need Answers? Try
> https://community.hortonworks.com <
> https://community.hortonworks.com/answers/index.html>
> >
> >
> >On 6/12/18, 1:08 PM, "Simon Elliston Ball" <simon@simonellistonball.com>
> wrote:
> >
> >>Good solution. The streaming enrichment writer makes a lot of sense for
> >>this, especially if you're not using huge enrichment sources that need
> >>the batch based loaders.
> >>
> >>As it happens I have written most of a NiFi processor to handle this
> >>use case directly - both non-record and Record based, especially for
> Otto :).
> >>The one thing we need to figure out now is where to host that, and how
> >>to handle releases of a nifi-metron-bundle. I'll probably get round to
> >>putting the code in my github at least in the next few days, while we
> >>figure out a more permanent home.
> >>
> >>Charlie, out of curiosity, what didn't you like about the flatfile
> >>loader script?
> >>
> >>Simon
> >>
> >>On 12 June 2018 at 18:00, Charles Joynt <Charles.Joynt@gresearch.co.uk>
> >>wrote:
> >>
> >>> Thanks for the responses. I appreciate the willingness to look at
> >>> creating a NiFi processer. That would be great!
> >>>
> >>> Just to follow up on this (after a week looking after the "ops" side
> >>> of
> >>> dev-ops): I really don't want to have to use the flatfile loader
> >>> script, and I'm not going to be able to write a Metron-style HBase
> >>> key generator any time soon, but I have had some success with a
> different approach.
> >>>
> >>> 1. Generate data in CSV format, e.g. "server.domain.local","A","
> >>> 192.168.0.198"
> >>> 2. Send this to a HTTP listener in NiFi 3. Write to a kafka topic
> >>>
> >>> I then followed your instructions in this blog:
> >>> https://cwiki.apache.org/confluence/display/METRON/
> >>> 2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichm
> >>> ent
> >>>
> >>> 4. Create a new "dns" sensor in Metron 5. Use the CSVParser and
> >>> SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
> >>> into HBase:
> >>>
> >>> {
> >>> "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
> >>> "writerClassName": "org.apache.metron.enrichment.writer.
> >>> SimpleHbaseEnrichmentWriter",
> >>> "sensorTopic": "dns",
> >>> "parserConfig": {
> >>> "shew.table": " dns",
> >>> "shew.cf": "dns",
> >>> "shew.keyColumns": "name",
> >>> "shew.enrichmentType": "dns",
> >>> "columns": {
> >>> "name": 0,
> >>> "type": 1,
> >>> "data": 2
> >>> }
> >>> },
> >>> }
> >>>
> >>> And... it seems to be working. At least, I have data in HBase which
> >>> looks more like the output of the flatfile loader.
> >>>
> >>> Charlie
> >>>
> >>> -----Original Message-----
> >>> From: Casey Stella [mailto:cestella@gmail.com]
> >>> Sent: 05 June 2018 14:56
> >>> To: dev@metron.apache.org
> >>> Subject: Re: Writing enrichment data directly from NiFi with
> >>> PutHBaseJSON
> >>>
> >>> The problem, as you correctly diagnosed, is the key in HBase. We
> >>> construct the key very specifically in Metron, so it's unlikely to
> >>> work out of the box with the NiFi processor unfortunately. The key
> >>> that we use is formed here in the codebase:
> >>> https://github.com/cestella/incubator-metron/blob/master/
> >>> metron-platform/metron-enrichment/src/main/java/org/
> >>> apache/metron/enrichment/converter/EnrichmentKey.java#L51
> >>>
> >>> To put that in english, consider the following:
> >>>
> >>> - type - The enrichment type
> >>> - indicator - the indicator to use
> >>> - hash(*) - A murmur 3 128bit hash function
> >>>
> >>> the key is hash(indicator) + type + indicator
> >>>
> >>> This hash prefixing is a standard practice in hbase key design that
> >>> allows the keys to be uniformly distributed among the regions and
> >>> prevents hotspotting. Depending on how the PutHBaseJSON processor
> >>> works, if you can construct the key and pass it in, then you might be
> >>> able to either construct the key in NiFi or write a processor to
> construct the key.
> >>> Ultimately though, what Carolyn said is true..the easiest approach is
> >>> probably using the flatfile loader.
> >>> If you do get this working in NiFi, however, do please let us know
> >>> and/or consider contributing it back to the project as a PR :)
> >>>
> >>>
> >>>
> >>> On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <
> >>> Charles.Joynt@gresearch.co.uk>
> >>> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> > I work as a Dev/Ops Data Engineer within the security team at a
> >>> > company in London where we are in the process of implementing
Metron.
> >>> > I have been tasked with implementing feeds of network environment
> >>> > data into HBase so that this data can be used as enrichment sources
> >>> > for our
> >>> security events.
> >>> > First-off I wanted to pull in DNS data for an internal domain.
> >>> >
> >>> > I am assuming that I need to write data into HBase in such a way
> >>> > that it exactly matches what I would get from the
> >>> > flatfile_loader.sh script. A colleague of mine has already loaded
> >>> > some DNS data using that script, so I am using that as a reference.
> >>> >
> >>> > I have implemented a flow in NiFi which takes JSON data from a HTTP
> >>> > listener and routes it to a PutHBaseJSON processor. The flow is
> >>> > working, in the sense that data is successfully written to HBase,
> >>> > but despite (naively) specifying "Row Identifier Encoding Strategy
> >>> > = Binary", the results in HBase don't look correct. Comparing the
> >>> > output from HBase scan commands I
> >>> > see:
> >>> >
> >>> > flatfile_loader.sh produced:
> >>> >
> >>> > ROW:
> >>> > \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x
> >>> > 00\
> >>> > x0E192.168.0.198
> >>> > CELL: column=data:v, timestamp=1516896203840,
> >>> > value={"clientname":"server.domain.local","clientip":"192.168.0.198
> >>> > "}
> >>> >
> >>> > PutHBaseJSON produced:
> >>> >
> >>> > ROW: server.domain.local
> >>> > CELL: column=dns:v, timestamp=1527778603783,
> >>> > value={"name":"server.domain.local","type":"A","data":"192.168.0.19
> >>> > 8"}
> >>> >
> >>> > From source JSON:
> >>> >
> >>> >
> >>> > {"k":"server.domain.local","v":{"name":"server.domain.local","type"
> >>> > :"A
> >>> > ","data":"192.168.0.198"}}
> >>> >
> >>> > I know that there are some differences in column family / field
> >>> > names, but my worry is the ROW id. Presumably I need to encode my
> >>> > row key, "k" in the JSON data, in a way that matches how the
> >>> > flatfile_loader.sh
> >>> script did it.
> >>> >
> >>> > Can anyone explain how I might convert my Id to the correct format?
> >>> > -or-
> >>> > Does this matter-can Metron use the human-readable ROW ids?
> >>> >
> >>> > Charlie Joynt
> >>> >
> >>> > --------------
> >>> > G-RESEARCH believes the information provided herein is reliable.
> >>> > While every care has been taken to ensure accuracy, the information
> >>> > is furnished to the recipients with no warranty as to the
> >>> > completeness and accuracy of its contents and on condition that any
> >>> > errors or omissions shall not be made the basis of any claim,
> >>> > demand or cause of
> >>> action.
> >>> > The information in this email is intended only for the named
> recipient.
> >>> > If you are not the intended recipient please notify us immediately
> >>> > and do not copy, distribute or take action based on this e-mail.
> >>> > All messages sent to and from this e-mail address will be logged by
> >>> > G-RESEARCH and are subject to archival storage, monitoring, review
> >>> > and disclosure.
> >>> > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor,
> >>> > Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> >>> > Trenchant Limited is a company registered in England with company
> >>> > number 08127121.
> >>> > --------------
> >>> >
> >>>
> >>
> >>
> >>
> >>--
> >>--
> >>simon elliston ball
> >>@sireb
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message