metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dima Kovalyov <>
Subject Re: [DISCUSS] Moving GeoIP management away from MySQL
Date Mon, 16 Jan 2017 15:53:26 GMT
Hello Justin,

Considering that Metron uses hbase tables for storing enrichment and
threatintel feeds, can we use Hbase for geo enrichment as well?
Or MapDB can be used for enrichment and threatintel feeds instead of hbase?

- Dima

On 01/16/2017 04:17 PM, Justin Leet wrote:
> Hi all,
> As a bit of background, right now, GeoIP data is loaded into and managed by
> MySQL (the connectors are LGPL licensed and we need to sever our Maven
> dependency on it before next release). We currently depend on and install
> an instance of MySQL (in each of the Management Pack, Ansible, and Docker
> installs). In the topology, we use the JDBCAdapter to connect to MySQL and
> query for a given IP.  Additionally, it's a single point of failure for
> that particular enrichment right now.  If MySQL is down, geo enrichment
> can't occur.
> I'm proposing that we eliminate the use of MySQL entirely, through all
> installation paths (which, unless I missed some, includes Ansible, the
> Ambari Management Pack, and Docker).  We'd do this by dropping all the
> various MySQL setup and management through the code, along with all the
> DDL, etc.  The JDBCAdapter would stay, so that anybody who wants to setup
> their own databases for enrichments and install connectors is able to do so.
> In its place, I've looked at using MapDB, which is a really easy to use
> library for creating Java collections backed by a file (This is NOT a
> separate installation of anything, it's just a jar that manages interaction
> with the file system).  Given the slow churn of the GeoIP files (I believe
> they get updated once a week), we can have a script that can be run when
> needed, downloads the MaxMind tar file, builds the MapDB file that will be
> used by the bolts, and places it into HDFS.  Finally, we update a config to
> point to the new file, the bolts get the updated config callback and can
> update their db files.  Inside the code, we wrap the MapDB portions to make
> it transparent to downstream code.
> The particularly nice parts about using MapDB are that its ease of use plus
> it offers the utilities we need out of the box to be able to support the
> operations we need on this (Keep in mind the GeoIP files use IP ranges and
> we need to be able to easily grab the appropriate range).
> The main point of concern I have about this is that when we grab the HDFS
> file during an update, given that multiple JVMs can be running, we don't
> want them to clobber each other. I believe this can be avoided by simply
> using each worker's working directory to store the file (and appropriately
> ensure threads on the same JVM manage multithreading).  This should keep
> the JVMs (and the underlying DB files) entirely independent.
> This script would get called by the various installations during startup to
> do the initial setup.  After install, it can then be called on demand in
> order.
> At this point, we should be all set, with everything running and updatable.
> Justin

View raw message