metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sml...@libero.it
Subject Re: Question about the customization of Metron with my machine learining algo.
Date Wed, 07 Jun 2017 10:28:29 GMT
Hello Casey,

your explanations (and Matt's one with the other email) help me. By the way, if I could, I
need more details.

The origins of my questions are both conceptual (Metron is absolutely a new tool for me) and
practical (e.g., I didn't find any guideline that explain where ML model should be stored
to run with Metron. In which folder I mean.)

* VM vs cluster:
o I pointed out the need to use a cluster. Which is the main reason? Is it linked to the performance?
I mean the processing resources needed to run Metron?
o In this stage, in which I'm learning Metron making also some experiment I would install
it into the VM.
+ Can I install Metron into Ubuntu 16.04 using vagrant? (implementation question)
* VM:
o Which Ubuntu do I have to use? 16.04?
o Which version of Metron do I have to use?
o Which version of NIFI do I have to install?
o Is there any additional tool that I do have to install?
* Model deployment:
o I would use NIFI as tool to get data from my network.
+ Is it any recommendation? (implementation question)
o Data collected with NIFI should be sent to Metron. Reading Metron architecture (https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture),
it seems possible.
+ I'm little confused about the data-flow at this point. You pointed out two caveats to parse
data and then fill in the  ML model. Can you please explain me something more? (conceptual/implementation
question).
o As my first test I would try a packet classifier ML model, but again after your two caveats
in which you said that only the second method is support I don't understand if I could classify
packet that comes from NIFI's probes.
+ Can you help me on this point?
* REST API (linked also to Matt's email):
o Ok, I'm not so string with this interface so my questions would be really basic.
+ Reading your example: https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616
I understand that I should develop my ML model in python.
+ Can I reuse the file rest.py pointing to my new model?
* About the steps to follow. I copy your indications in RED  as follows and my question are
in BLACK:
o Anyway, so for you to use your own ML model, you'd do the following:

1. Ingest the sensor data source that you want to ingest into a kafka topic --> Can I use
NIFI? Is it a transparent process for me or is there some code to be write?
2. Create or reuse one of the existing parsers that we support to convert the data from your
data source --> I do not understand. Do you refer to Stellar? I don't undrestand what Stellar
it is.
3. Create your model (see https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616
as an example)
4. refer to your model from stellar
1. In the example I mentioned, we're doing that at https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#adjust-configurations-for-squid-to-call-model
2. You might consider doing it in the enrichment topology, but to get you started, doing it
as a field trasnformation as in the example should suffice
* Dataworks summit:
o You said that your speech is public, didn't it?
+ Do you know if I could follow it offline from somewhere link?
* Blog:
o Which is the blog that you are referring to?

So, in summary I would test an ML network packets classification model. Most of my question
are to understand where I should put my hands to have one VM that runs Metron.

In this stage, as newbe for Metron I would use Metron as a tool, focusing on the ML model
in Python.

Thank you in advance for your useful answers.

Best Regards,
Simone

> Il 6 giugno 2017 alle 19.43 Casey Stella <cestella@gmail.com> ha scritto:
> 
>     So, first off, it's not a basic question at all and thanks for asking it.  I'm sure
if it's not clear to you, then it's not clear to many and bears some reinforcement and clarification.
> 
>         * Metron does indeed enable the deployment and use of machine learning models
on data ingested into Metron
>         * Metron runs atop Hadoop (storm + kafka + hdfs + hbase), so you likely wouldn't
run this successfully on a VM, but rather a cluster.  We do support running Metron for demonstration
purposes and development purposes inside a VM, but that's not a production configuration,
I'd like to make clear.
>     Models deployed via MaaS can be interacted with via Stellar on data ingested into
Metron under a couple caveats.  There are two ways to ingest data into Metron:
>         * Via a packet capture sensor (fastcapa) to Kafka to the pcap storm topology,
which writes directly to HDFS with no preamble or enrichment
>         * Via another, lower velocity sensor (e.g. bro for deep packet inspection or
yaf for flow data) which is routed to a parser topology, then to enrichment and finally to
indexing
>     We do not, at present, support interacting with models (or, indeed, any enrichment)
on raw packet data (the first case above).  We do, however, support it on the second usecase.
 The example at https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#example
https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#example
>     demonstrates ingesting web proxy data and using a dummy machine learning model to
pick out domains which are synthetic and likely to represent communication to a botnet (the
DGA model in that example is crude and could easily be replaced with the example I posed earlier,
btw).
> 
>     Anyway, so for you to use your own ML model, you'd do the following:
>        1. Ingest the sensor data source that you want to ingest into a kafka topic
>        2. Create or reuse one of the existing parsers that we support to convert the
data from your data source
>        3. Create your model (see https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616
https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616 as an example)
>        4. refer to your model from stellar
>              1. In the example I mentioned, we're doing that at https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#adjust-configurations-for-squid-to-call-model
>              2. You might consider doing it in the enrichment topology, but to get you
started, doing it as a field trasnformation as in the example should suffice
>     Hopefully that'll clear some things up.  I'm about to give a talk about this next
week at Dataworks summit, so I'll be sure to follow-up here with the deck.  There's also a
blog post that will eventually be going out with this walked through more directly.
> 
>     If I missed osmething or if something isn't clear yet, I'll be sure to keep at it.
:)
> 
>     Best,
> 
>     Casey
> 
>     On Mon, Jun 5, 2017 at 1:21 PM, <smlabs@libero.it mailto:smlabs@libero.it >
wrote:
> 
>         > > 
> >         Hello Casey,
> > 
> >         your answer makes something more clear, but not at all.
> > 
> >         My question about ML models was because somewhere on the web I read that
Metron comes with ML.
> >         But maybe it's better to say that it supports ML models.
> > 
> >         If I understood well, I can run Metron in a virtual machine connected to
my network. With NIFI I can select the protocols/packets that I would store (similar as Wireshark
does).
> > 
> >         Then, I do not understand how to fill the data in to the ML algorithm.
> > 
> >         Can you try to explain me something more, or indicate any tutorial that
can explain the implementation process.
> > 
> >         For example if I have an SVM algo that I would test into Metron and that
ML algortihm has been developed in python using scikit-py.
> > 
> >         How can I do that?
> > 
> >         Thank you and I'm sorry for the very basic question.
> > 
> >         Best Regards,
> > 
> >         Simone
> > 
> >             > > > 
> > >             Il 5 giugno 2017 alle 18.45 Casey Stella <cestella@gmail.com
mailto:cestella@gmail.com > ha scritto:
> > > 
> > >             We do not ship any ML models currently with metron, just the infrastructure
> > >             to deploy your own models and interact with those models from within
> > >             Metron. That being said, you might be interested in
> > >             https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616
https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616 That's
> > >             the code to take a DGA model written in scikit-learn from
> > >             https://github.com/ClickSecurity/data_hacking/tree/master/dga_detection
https://github.com/ClickSecurity/data_hacking/tree/master/dga_detection and
> > >             suitable for deployment via MaaS.
> > > 
> > >             If you want more information about MaaS, I'll be giving a talk
on it next
> > >             week at DataWorks Summit and that deck will be public.
> > > 
> > >             On Mon, Jun 5, 2017 at 12:09 PM, <smlabs@libero.it mailto:smlabs@libero.it
> wrote:
> > > 
> > >                 > > > > 
> > > >                 Hello Simon,
> > > > 
> > > >                 thank you for your prompt replay and for the link as well.
> > > > 
> > > >                 I'm more confortable with Python.
> > > > 
> > > >                 May I ask you if there is any example in python that I
use as template to
> > > >                 receive network packets and then implement the machine
learning algorithm?
> > > > 
> > > >                 Moreover, where can I find documentation about the ML
algorithm already
> > > >                 implemeneted into the Metron?
> > > > 
> > > >                 Best Regards,
> > > > 
> > > >                 Simone
> > > > 
> > > >                     > > > > > 
> > > > >                     Il 5 giugno 2017 alle 18.00 Simon Elliston Ball
<
> > > > >                     simon@simonellistonball.com mailto:simon@simonellistonball.com
> ha scritto:
> > > > > 
> > > > >                     Hi Simone, and welcome to the community.
> > > > > 
> > > > >                     There are a number of extension points in Metron,
the key ones being
> > > > >                     around machine learning. I suggest taking a look
at
> > > > >                     https://github.com/apache/metron/tree/master/metron-
https://github.com/apache/metron/tree/master/metron-
> > > > >                     analytics/metron-maas-service for more information
about the model as a
> > > > >                     service. This is the bit that helps you add models
in pretty much any
> > > > >                     language that will run in a yarn container (python,
R and spark models are
> > > > >                     probably the most popular).
> > > > > 
> > > > >                     Hope that helps, and looking forward to hearing
more about your
> > > > >                     research, and any contributions you feel like
adding to the community.
> > > > > 
> > > > >                     Simon
> > > > > 
> > > > >                         > > > > > > 
> > > > > >                             > > > > > > >

> > > > > > >                             On 5 Jun 2017, at 16:54, smlabs@libero.it
mailto:smlabs@libero.it mailto:
> > > > > > >                             smlabs@libero.it mailto:smlabs@libero.it
wrote:
> > > > > > > 
> > > > > > >                         > > > > > > 
> > > > > >                         Dear community,
> > > > > > 
> > > > > >                         my name is Simone and I'm researcher
in the field of
> > > > > >                         cybersecurity.
> > > > > > 
> > > > > >                         I've just read about Apache Metron and
I would ask:
> > > > > > 
> > > > > >                             * does it use machine learning or
artificial intelligence?
> > > > > > 
> > > > > >                             * can I extend the machine learining
algo already present into
> > > > > >                               the Metron with mines?
> > > > > > 
> > > > > >                             * which is the language that I have
to use to extend Metron
> > > > > >                               with my algorithms?
> > > > > > 
> > > > > >                               Thank you.
> > > > > > 
> > > > > >                               Best Regards,
> > > > > > 
> > > > > >                               Simone
> > > > > > 
> > > > > >                               >
> > > > > > 
> > > > > >                     > > > > > 
> > > > >                 > > > > 
> > > >             > > > 
> > >         > > 
> >     > 
> 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message