metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Joynt <>
Subject RE: Writing enrichment data directly from NiFi with PutHBaseJSON
Date Tue, 12 Jun 2018 17:00:17 GMT
Thanks for the responses. I appreciate the willingness to look at creating a NiFi processer.
That would be great!

Just to follow up on this (after a week looking after the "ops" side of dev-ops): I really
don't want to have to use the flatfile loader script, and I'm not going to be able to write
a Metron-style HBase key generator any time soon, but I have had some success with a different

1. Generate data in CSV format, e.g. "server.domain.local","A",""
2. Send this to a HTTP listener in NiFi
3. Write to a kafka topic

I then followed your instructions in this blog:

4. Create a new "dns" sensor in Metron
5. Use the CSVParser and SimpleHbaseEnrichmentWriter, and parserConfig settings to push this
into HBase:

	"parserClassName": "org.apache.metron.parsers.csv.CSVParser",
	"writerClassName": "org.apache.metron.enrichment.writer.SimpleHbaseEnrichmentWriter",
	"sensorTopic": "dns",
	"parserConfig": {
		"shew.table": " dns",
		"": "dns",
		"shew.keyColumns": "name",
		"shew.enrichmentType": "dns",
		"columns": {
			"name": 0,
			"type": 1,
			"data": 2

And... it seems to be working. At least, I have data in HBase which looks more like the output
of the flatfile loader.


-----Original Message-----
From: Casey Stella [] 
Sent: 05 June 2018 14:56
Subject: Re: Writing enrichment data directly from NiFi with PutHBaseJSON

The problem, as you correctly diagnosed, is the key in HBase.  We construct the key very specifically
in Metron, so it's unlikely to work out of the box with the NiFi processor unfortunately.
 The key that we use is formed here in the codebase:

To put that in english, consider the following:

   - type - The enrichment type
   - indicator - the indicator to use
   - hash(*) - A murmur 3 128bit hash function

the key is hash(indicator) + type + indicator

This hash prefixing is a standard practice in hbase key design that allows the keys to be
uniformly distributed among the regions and prevents hotspotting.  Depending on how the PutHBaseJSON
processor works, if you can construct the key and pass it in, then you might be able to either
construct the key in NiFi or write a processor to construct the key.
Ultimately though, what Carolyn said is true..the easiest approach is probably using the flatfile
If you do get this working in NiFi, however, do please let us know and/or consider contributing
it back to the project as a PR :)

On Fri, Jun 1, 2018 at 6:26 AM Charles Joynt <>

> Hello,
> I work as a Dev/Ops Data Engineer within the security team at a 
> company in London where we are in the process of implementing Metron. 
> I have been tasked with implementing feeds of network environment data 
> into HBase so that this data can be used as enrichment sources for our security events.
> First-off I wanted to pull in DNS data for an internal domain.
> I am assuming that I need to write data into HBase in such a way that 
> it exactly matches what I would get from the 
> script. A colleague of mine has already loaded some DNS data using 
> that script, so I am using that as a reference.
> I have implemented a flow in NiFi which takes JSON data from a HTTP 
> listener and routes it to a PutHBaseJSON processor. The flow is 
> working, in the sense that data is successfully written to HBase, but 
> despite (naively) specifying "Row Identifier Encoding Strategy = 
> Binary", the results in HBase don't look correct. Comparing the output 
> from HBase scan commands I
> see:
> produced:
> ROW:
> \xFF\xFE\xCB\xB8\xEF\x92\xA3\xD9#xC\xF9\xAC\x0Ap\x1E\x00\x05whois\x00\
> x0E192.168.0.198
> CELL: column=data:v, timestamp=1516896203840, 
> value={"clientname":"server.domain.local","clientip":""}
> PutHBaseJSON produced:
> ROW:  server.domain.local
> CELL: column=dns:v, timestamp=1527778603783, 
> value={"name":"server.domain.local","type":"A","data":""}
> From source JSON:
> {"k":"server.domain.local","v":{"name":"server.domain.local","type":"A
> ","data":""}}
> I know that there are some differences in column family / field names, 
> but my worry is the ROW id. Presumably I need to encode my row key, 
> "k" in the JSON data, in a way that matches how the script did it.
> Can anyone explain how I might convert my Id to the correct format?
> -or-
> Does this matter-can Metron use the human-readable ROW ids?
> Charlie Joynt
> --------------
> G-RESEARCH believes the information provided herein is reliable. While 
> every care has been taken to ensure accuracy, the information is 
> furnished to the recipients with no warranty as to the completeness 
> and accuracy of its contents and on condition that any errors or 
> omissions shall not be made the basis of any claim, demand or cause of action.
> The information in this email is intended only for the named recipient.
> If you are not the intended recipient please notify us immediately and 
> do not copy, distribute or take action based on this e-mail.
> All messages sent to and from this e-mail address will be logged by 
> G-RESEARCH and are subject to archival storage, monitoring, review and 
> disclosure.
> G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, 
> Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> Trenchant Limited is a company registered in England with company 
> number 08127121.
> --------------
View raw message