metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Foley <ma...@apache.org>
Subject Re: Question about the customization of Metron with my machine learining algo.
Date Tue, 06 Jun 2017 18:36:50 GMT
Hope you don’t mind if I chime in.  There are a couple very basic points which are in the
documentation, but may not jump out at a new user, who is trying to learn Metron at the same
time as MaaS.

1. In the thread below there is only a brief reference to the main documentation page for
MaaS, at https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service
Hopefully you’ve read it, but if not please do.

2. The “model”, with its required REST API, is expected to run in its own sub-system,
which may or may not be co-resident with the Metron installation, depending on load considerations.
 Metron provides very useful optional infrastructure for YARN provisioning, deployment, and
monitoring of co-resident models, as described in the above web page. However, the model subsystem
is considered to be external to Metron itself, by design.

3. The interface to MaaS is via a few specific Stellar function outcalls, which access the
model’s REST API, and configuration information in Zookeeper.  These calls may be used anywhere
Stellar is acceptable.  The most logical place to use an outcall to MaaS is in a Stellar Enrichment
bolt, but it also makes sense to use it in a Stellar field transformation in a Parser bolt.

Hope this is useful,
--Matt

On 6/6/17, 10:43 AM, "Casey Stella" <cestella@gmail.com> wrote:

    So, first off, it's not a basic question at all and thanks for asking it.
    I'm sure if it's not clear to you, then it's not clear to many and bears
    some reinforcement and clarification.
    
    
       - Metron does indeed enable the deployment and use of machine learning
       models on data ingested into Metron
       - Metron runs atop Hadoop (storm + kafka + hdfs + hbase), so you likely
       wouldn't run this successfully on a VM, but rather a cluster.  We do
       support running Metron for demonstration purposes and development purposes
       inside a VM, but that's not a production configuration, I'd like to make
       clear.
    
    Models deployed via MaaS can be interacted with via Stellar on data
    ingested into Metron under a couple caveats.  There are two ways to ingest
    data into Metron:
    
       - Via a packet capture sensor (fastcapa) to Kafka to the pcap storm
       topology, which writes directly to HDFS with no preamble or enrichment
       - Via another, lower velocity sensor (e.g. bro for deep packet
       inspection or yaf for flow data) which is routed to a parser topology, then
       to enrichment and finally to indexing
    
    We do not, at present, support interacting with models (or, indeed, any
    enrichment) on raw packet data (the first case above).  We do, however,
    support it on the second usecase.  The example at https://github.com/apache/
    metron/tree/master/metron-analytics/metron-maas-service#example
    demonstrates ingesting web proxy data and using a dummy machine learning
    model to pick out domains which are synthetic and likely to represent
    communication to a botnet (the DGA model in that example is crude and could
    easily be replaced with the example I posed earlier, btw).
    
    Anyway, so for you to use your own ML model, you'd do the following:
    
       1. Ingest the sensor data source that you want to ingest into a kafka
       topic
       2. Create or reuse one of the existing parsers that we support to
       convert the data from your data source
       3. Create your model (see https://gist.github.com/cestella/
       8dd83031b8898a732b6a5a60fce1b616
       <https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616> as
       an example)
       4. refer to your model from stellar
          1. In the example I mentioned, we're doing that at
          https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#adjust-configurations-for-squid-to-call-model
          2. You might consider doing it in the enrichment topology, but to get
          you started, doing it as a field trasnformation as in the example should
          suffice
    
    Hopefully that'll clear some things up.  I'm about to give a talk about
    this next week at Dataworks summit, so I'll be sure to follow-up here with
    the deck.  There's also a blog post that will eventually be going out with
    this walked through more directly.
    
    If I missed osmething or if something isn't clear yet, I'll be sure to keep
    at it. :)
    
    Best,
    
    Casey
    
    On Mon, Jun 5, 2017 at 1:21 PM, <smlabs@libero.it> wrote:
    
    > Hello Casey,
    >
    > your answer makes something more clear, but not at all.
    >
    > My question about ML models was because somewhere on the web I read that
    > Metron comes with ML.
    > But maybe it's better to say that it supports ML models.
    >
    > If I understood well, I can run Metron in a virtual machine connected to
    > my network. With NIFI I can select the protocols/packets that I would store
    > (similar as Wireshark does).
    >
    > Then, I do not understand how to fill the data in to the ML algorithm.
    >
    > Can you try to explain me something more, or indicate any tutorial that
    > can explain the implementation process.
    >
    > For example if I have an SVM algo that I would test into Metron and that
    > ML algortihm has been developed in python using scikit-py.
    >
    > How can I do that?
    >
    > Thank you and I'm sorry for the very basic question.
    >
    > Best Regards,
    >
    > Simone
    >
    > Il 5 giugno 2017 alle 18.45 Casey Stella <cestella@gmail.com> ha scritto:
    >
    > We do not ship any ML models currently with metron, just the infrastructure
    > to deploy your own models and interact with those models from within
    > Metron. That being said, you might be interested in
    > https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616 That's
    > the code to take a DGA model written in scikit-learn from
    > https://github.com/ClickSecurity/data_hacking/tree/master/dga_detection
    > and
    > suitable for deployment via MaaS.
    >
    > If you want more information about MaaS, I'll be giving a talk on it next
    > week at DataWorks Summit and that deck will be public.
    >
    > On Mon, Jun 5, 2017 at 12:09 PM, <smlabs@libero.it> wrote:
    >
    > Hello Simon,
    >
    > thank you for your prompt replay and for the link as well.
    >
    > I'm more confortable with Python.
    >
    > May I ask you if there is any example in python that I use as template to
    > receive network packets and then implement the machine learning algorithm?
    >
    > Moreover, where can I find documentation about the ML algorithm already
    > implemeneted into the Metron?
    >
    > Best Regards,
    >
    > Simone
    >
    > Il 5 giugno 2017 alle 18.00 Simon Elliston Ball <
    > simon@simonellistonball.com> ha scritto:
    >
    > Hi Simone, and welcome to the community.
    >
    > There are a number of extension points in Metron, the key ones being
    > around machine learning. I suggest taking a look at
    > https://github.com/apache/metron/tree/master/metron-
    > analytics/metron-maas-service for more information about the model as a
    > service. This is the bit that helps you add models in pretty much any
    > language that will run in a yarn container (python, R and spark models are
    > probably the most popular).
    >
    > Hope that helps, and looking forward to hearing more about your
    > research, and any contributions you feel like adding to the community.
    >
    > Simon
    >
    > On 5 Jun 2017, at 16:54, smlabs@libero.it mailto:
    > smlabs@libero.it wrote:
    >
    > Dear community,
    >
    > my name is Simone and I'm researcher in the field of
    > cybersecurity.
    >
    > I've just read about Apache Metron and I would ask:
    >
    >    -
    >
    >    does it use machine learning or artificial intelligence?
    >    -
    >
    >    can I extend the machine learining algo already present into
    >    the Metron with mines?
    >    -
    >
    >    which is the language that I have to use to extend Metron
    >    with my algorithms?
    >
    >    Thank you.
    >
    >    Best Regards,
    >
    >    Simone
    >
    >    >
    >
    >
    



Mime
View raw message