metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zeolla@GMail.com" <zeo...@gmail.com>
Subject Re: [DISCUSS] Metron assessment tool
Date Tue, 12 Jul 2016 21:41:12 GMT
I can definitely give it a shot.  A kickstart would be appreciated.

Jom

On Tue, Jul 12, 2016, 17:17 James Sirota <jsirota@apache.org> wrote:

> John,
>
> Just field METRON-318.  Is this something you would like to work on?
> Would you like help from us to get started?
>
> Thanks,
> James
>
> 12.07.2016, 11:53, "Zeolla@GMail.com" <zeolla@gmail.com>:
> > Hi All,
> >
> > Has there been any additional discussion or development regarding this? I
> > did take a brief look around the jira and didn't see anything regarding
> > this, but I may have missed it. Thanks,
> >
> > Jon
> >
> > On Fri, Apr 15, 2016 at 2:01 PM Nick Allen <nick@nickallen.org> wrote:
> >
> >>  I definitely agree that you need this level of understanding of your
> >>  cluster. It definitely could work the way that you describe.
> >>
> >>  I was thinking of it slightly differently though. The metrics for this
> >>  purpose (understanding performance of existing cluster) should come
> from
> >>  the actual sensors themselves. For example, I need to instrument the
> >>  packet capture process so that it kicks out time-series-ish metrics
> that
> >>  you can monitor in a dashboard over time.
> >>
> >>  On Fri, Apr 15, 2016 at 1:40 PM, Zeolla@GMail.com <zeolla@gmail.com>
> >>  wrote:
> >>
> >>  > However, it would be handy to have something like this perpetually
> >>  running
> >>  > so you know when to scale up/out/down/in a cluster.
> >>  >
> >>  > On Fri, Apr 15, 2016, 13:35 Nick Allen <nick@nickallen.org> wrote:
> >>  >
> >>  > > I think it is slightly different. I don't even want to install
> minimal
> >>  > > Kafka infrastructure (Look ma, no Kafka!)
> >>  > >
> >>  > > The exact implementation would differ based on the data inputs
> that you
> >>  > are
> >>  > > trying to measure, but for example...
> >>  > >
> >>  > > - To understand raw packet rates I would have a specialized sensor
> >>  > that
> >>  > > counts packets and size on the wire. It doesn't do anything more
> >>  than
> >>  > > that.
> >>  > > - To understand Netflow rates, it would watch for Netflow packets
> >>  and
> >>  > > count those.
> >>  > > - To understand sizing around application logs, a sensor would
> watch
> >>  > for
> >>  > > Syslog packets and count those.
> >>  > >
> >>  > > The implementation would be more similar to raw packet capture with
> >>  some
> >>  > > DPI. No Hadoop-y components required.
> >>  > >
> >>  > >
> >>  > >
> >>  > > On Fri, Apr 15, 2016 at 1:10 PM, James Sirota <
> jsirota@hortonworks.com
> >>  >
> >>  > > wrote:
> >>  > >
> >>  > > > So this is exactly what I am proposing. Calculate the metrics
on
> the
> >>  > fly
> >>  > > > without landing any data in the cluster. The problem is that
that
> >>  > > > enterprise data volumes are so large you can’t just point
them
> at a
> >>  > Java
> >>  > > or
> >>  > > > a C++ program or sensor. You either need an existing minimal
> Kafka
> >>  > > > infrastructure to take that load or sample the data.
> >>  > > >
> >>  > > > Thanks,
> >>  > > > James
> >>  > > >
> >>  > > >
> >>  > > >
> >>  > > >
> >>  > > > On 4/15/16, 9:54 AM, "Nick Allen" <nick@nickallen.org>
wrote:
> >>  > > >
> >>  > > > >Or we have the assessment tool not actually land any data.
The
> >>  > > assessment
> >>  > > > >tool becomes a 'sensor' in its own right. You just point
the
> input
> >>  > data
> >>  > > > >sets at the assessment tool, it builds metrics on the input
(for
> >>  > > example:
> >>  > > > >count the number of packets per second) and then we use
those
> >>  metrics
> >>  > to
> >>  > > > >estimate cluster size.
> >>  > > > >
> >>  > > > >On Wed, Apr 13, 2016 at 5:45 PM, James Sirota <
> >>  > jsirota@hortonworks.com>
> >>  > > > >wrote:
> >>  > > > >
> >>  > > > >> That’s an excellent point. So I think there are three
ways
> >>  forward.
> >>  > > > >>
> >>  > > > >> One is we can assume that there has to be at least
a minimal
> >>  > > > >> infrastructure in place (at least a subset of Kafka
and Storm
> >>  > > > resources) to
> >>  > > > >> run a full-scale assessment. If you point something
that
> blasts
> >>  > > > millions
> >>  > > > >> of messages per second at something like ActiveMQ you
are
> going to
> >>  > > blow
> >>  > > > >> up. So the infrastructure to at least receive these
kinds of
> >>  > message
> >>  > > > >> volumes has to exist as a pre-requisite. There is no
way to
> get
> >>  > around
> >>  > > > that.
> >>  > > > >>
> >>  > > > >> The second approach I see is sampling. Sampling is
a lot less
> >>  > precise
> >>  > > > and
> >>  > > > >> you can miss peaks that fall outside of your sampling
windows.
> >>  But
> >>  > > the
> >>  > > > >> obvious benefit is that you don’t need a cluster
to process
> these
> >>  > > > streams.
> >>  > > > >> You can probably perform most of your calculations
with a
> >>  > > multithreaded
> >>  > > > >> java program. Sampling poses a few design challenges.
First,
> >>  where
> >>  > > do
> >>  > > > you
> >>  > > > >> sample? Do you sample on the sensor? (the implication
here is
> >>  that
> >>  > we
> >>  > > > have
> >>  > > > >> to program some sort of sampling capability in our
sensors) .
> Do
> >>  you
> >>  > > > sample
> >>  > > > >> on transport? (maybe a Flume interceptor or NiFi processor).
> >>  There
> >>  > is
> >>  > > > also
> >>  > > > >> a question of what the sampling rate should be. Not
knowing
> >>  > > statistical
> >>  > > > >> properties of a stream ahead of time it’s hard to
make that
> call.
> >>  > > > >>
> >>  > > > >> The third option I think is MR job. We can blast the
data into
> >>  HDFS
> >>  > > and
> >>  > > > >> then go over it with MR to derive the metrics we are
looking
> for.
> >>  > > Then
> >>  > > > we
> >>  > > > >> don’t have to sample or setup expensive infrastructure
to
> receive
> >>  a
> >>  > > > deluge
> >>  > > > >> of data. But then we run into the chicken and the egg
problem
> >>  that
> >>  > in
> >>  > > > >> order to size your HDFS you need to have data in HDFS.
Ideally
> >>  you
> >>  > > > need to
> >>  > > > >> capture at least one full weeks worth of logs because
patterns
> >>  > > > throughout
> >>  > > > >> the day as well as every day of the week have different
> >>  statistical
> >>  > > > >> properties. So you need off peak, on peak, weekdays
and
> weekends
> >>  to
> >>  > > > derive
> >>  > > > >> these stats in batch.
> >>  > > > >>
> >>  > > > >> Any other design ideas?
> >>  > > > >>
> >>  > > > >> Thanks,
> >>  > > > >> James
> >>  > > > >>
> >>  > > > >>
> >>  > > > >>
> >>  > > > >>
> >>  > > > >>
> >>  > > > >> On 4/13/16, 1:59 PM, "Nick Allen" <nick@nickallen.org>
wrote:
> >>  > > > >>
> >>  > > > >> >If the tool starts at Kafka, the user would have
to already
> have
> >>  > > > committed
> >>  > > > >> >to the investment in the infrastructure and time
to setup the
> >>  > sensors
> >>  > > > that
> >>  > > > >> >feed Kafka and Kafka itself. Maybe it would need
to be
> further
> >>  > > > upstream?
> >>  > > > >> >On Apr 13, 2016 1:05 PM, "James Sirota" <
> jsirota@hortonworks.com
> >>  >
> >>  > > > wrote:
> >>  > > > >> >
> >>  > > > >> >> Hi Goerge,
> >>  > > > >> >>
> >>  > > > >> >> This article defines micro-tuning of the existing
cluster.
> >>  What
> >>  > I
> >>  > > am
> >>  > > > >> >> proposing is a level up from that. When you
start with
> Metron
> >>  > how
> >>  > > do
> >>  > > > >> you
> >>  > > > >> >> even know how many nodes you need? And of
these nodes how
> many
> >>  > do
> >>  > > > you
> >>  > > > >> >> allocate to Storm, indexing, storage? How
much storage do
> you
> >>  > > need?
> >>  > > > >> >> Tuning would be the next step in the process,
but this tool
> >>  would
> >>  > > > answer
> >>  > > > >> >> more fundamental questions about what a Metron
deployment
> >>  should
> >>  > > look
> >>  > > > >> like
> >>  > > > >> >> given the number of telemetries and retention
policies of
> the
> >>  > > > >> enterprise.
> >>  > > > >> >>
> >>  > > > >> >> The best way to get this data (in my opinion)
is to have
> some
> >>  > tool
> >>  > > > that
> >>  > > > >> we
> >>  > > > >> >> can plug into Metron’s point of ingest (kafka
topics) and
> run
> >>  > that
> >>  > > > for
> >>  > > > >> >> about a week or a month to be able to figure
that out and
> spit
> >>  > out
> >>  > > > these
> >>  > > > >> >> relevant metrics. Based on these metrics we
can figure out
> the
> >>  > > > >> fundamental
> >>  > > > >> >> things about what metron should look like.
Tuning would be
> the
> >>  > > next
> >>  > > > >> step.
> >>  > > > >> >>
> >>  > > > >> >> Thanks,
> >>  > > > >> >> James
> >>  > > > >> >>
> >>  > > > >> >>
> >>  > > > >> >>
> >>  > > > >> >>
> >>  > > > >> >> On 4/13/16, 9:52 AM, "George Vetticaden" <
> >>  > > > gvetticaden@hortonworks.com>
> >>  > > > >> >> wrote:
> >>  > > > >> >>
> >>  > > > >> >> >I have used the following Kafka and Storm
Best Practices
> guide
> >>  > at
> >>  > > > >> numerous
> >>  > > > >> >> >customer implementations.
> >>  > > > >> >> >
> >>  > > > >> >>
> >>  > > > >>
> >>  > > >
> >>  > >
> >>  >
> >>
> https://community.hortonworks.com/articles/550/unofficial-storm-and-kafka-b
> >>  > > > >> >> >est-practices-guide.html
> >>  > > > >> >> >
> >>  > > > >> >> >
> >>  > > > >> >> >We need to have something similar and
prescriptive for
> Metron
> >>  > > based
> >>  > > > on:
> >>  > > > >> >> >1. What data sources are we enabling
> >>  > > > >> >> >2. What enrichment services are we enabling
> >>  > > > >> >> >3. What threat intel services are we enabling
> >>  > > > >> >> >4. What are we indexing into Solr/Elastic
and how long
> >>  > > > >> >> >5. What are we persisting into HDFS..
> >>  > > > >> >> >
> >>  > > > >> >> >Ideally, the The metron assessment tool
combined with an
> >>  > > > introspection
> >>  > > > >> of
> >>  > > > >> >> >the user’s ansible configuration should
drive what ambari
> >>  > > blueprint
> >>  > > > >> type
> >>  > > > >> >> >and configuration should be used when
the cluster is spun
> up
> >>  and
> >>  > > the
> >>  > > > >> storm
> >>  > > > >> >> >topology is deployed.
> >>  > > > >> >> >
> >>  > > > >> >> >
> >>  > > > >> >> >--
> >>  > > > >> >> >George VetticadenPrincipal, COE
> >>  > > > >> >> >gvetticaden@hortonworks.com
> >>  > > > >> >> >(630) 909-9138
> >>  > > > >> >> >
> >>  > > > >> >> >
> >>  > > > >> >> >
> >>  > > > >> >> >
> >>  > > > >> >> >
> >>  > > > >> >> >On 4/13/16, 11:40 AM, "George Vetticaden"
<
> >>  > > > gvetticaden@hortonworks.com
> >>  > > > >> >
> >>  > > > >> >> >wrote:
> >>  > > > >> >> >
> >>  > > > >> >> >>+ 1 to James suggestion.
> >>  > > > >> >> >>We also need to consider not just
the data volume and
> storage
> >>  > > > >> >> requirements
> >>  > > > >> >> >>for proper cluster sizing but also
processing
> requirements as
> >>  > > well.
> >>  > > > >> Given
> >>  > > > >> >> >>that in the new architecture, we have
moved to single
> >>  > enrichment
> >>  > > > >> topology
> >>  > > > >> >> >>that will support all data sources,
proper sizing of the
> >>  > > enrichment
> >>  > > > >> >> >>topology will be even more crucial
to maintain SLAs and
> HA
> >>  > > > >> requirements.
> >>  > > > >> >> >>The following key questions will apply
to each parser
> >>  topology
> >>  > > and
> >>  > > > >> single
> >>  > > > >> >> >>enrichment topology
> >>  > > > >> >> >>
> >>  > > > >> >> >>1. Number of workers?
> >>  > > > >> >> >>2. Number of workers per machine?
> >>  > > > >> >> >>3. Size of each workers (in memory)?
> >>  > > > >> >> >>4. Supervisor memory settings
> >>  > > > >> >> >>
> >>  > > > >> >> >>The assessment tool should also be
used to size
> topologies
> >>  > > > correctly
> >>  > > > >> as
> >>  > > > >> >> >>well.
> >>  > > > >> >> >>
> >>  > > > >> >> >>Tuning Kafka, Hbase and Solr/Elastic
should also be
> governed
> >>  by
> >>  > > the
> >>  > > > >> >> Metron
> >>  > > > >> >> >>assessment tool.
> >>  > > > >> >> >>
> >>  > > > >> >> >>
> >>  > > > >> >> >>--
> >>  > > > >> >> >>George Vetticaden
> >>  > > > >> >> >>
> >>  > > > >> >> >>
> >>  > > > >> >> >>
> >>  > > > >> >> >>
> >>  > > > >> >> >>
> >>  > > > >> >> >>
> >>  > > > >> >> >>
> >>  > > > >> >> >>On 4/13/16, 11:28 AM, "James Sirota"
<
> >>  jsirota@hortonworks.com>
> >>  > > > wrote:
> >>  > > > >> >> >>
> >>  > > > >> >> >>>Prior to adoption of Metron each
adopting entity needs
> to
> >>  > > > guesstimate
> >>  > > > >> >> >>>it¹s data volume and data storage
requirements so they
> can
> >>  > size
> >>  > > > their
> >>  > > > >> >> >>>cluster properly. I propose a
creation of an assessment
> >>  tool
> >>  > > that
> >>  > > > >> can
> >>  > > > >> >> >>>plug in to a Kafka topic for a
given telemetry and over
> time
> >>  > > > produce
> >>  > > > >> >> >>>statistics for ingest volumes
and storage requirement.
> The
> >>  > idea
> >>  > > > is
> >>  > > > >> that
> >>  > > > >> >> >>>prior to adoption of Metron someone
can set up all the
> feeds
> >>  > and
> >>  > > > >> kafka
> >>  > > > >> >> >>>topics, but instead of deploying
Metron right away they
> >>  would
> >>  > > > deploy
> >>  > > > >> >> this
> >>  > > > >> >> >>>tool. This tool would then produce
statistics for data
> >>  > > > >> ingest/storage
> >>  > > > >> >> >>>requirement, and all relevant
information needed for
> cluster
> >>  > > > sizing.
> >>  > > > >> >> >>>
> >>  > > > >> >> >>>Some of the metrics that can be
recorded are:
> >>  > > > >> >> >>>
> >>  > > > >> >> >>> * Number of system events per
second (average, max,
> >>  mean,
> >>  > > > >> standard
> >>  > > > >> >> >>>dev)
> >>  > > > >> >> >>> * Message size (average, max,
mean, standard dev)
> >>  > > > >> >> >>> * Average number of peaks
> >>  > > > >> >> >>> * Duration of peaks (average,
max, mean, standard dev)
> >>  > > > >> >> >>>
> >>  > > > >> >> >>>If the parser for a telemetry
exist the tool can produce
> >>  > > > additional
> >>  > > > >> >> >>>statistics
> >>  > > > >> >> >>>
> >>  > > > >> >> >>> * Number of keys/fields parsed
(average, max, mean,
> >>  > standard
> >>  > > > dev)
> >>  > > > >> >> >>> * Length of field parsed (average,
max, mean, standard
> >>  > dev)
> >>  > > > >> >> >>> * Length of key parsed (average,
max, mean, standard
> >>  dev)
> >>  > > > >> >> >>>
> >>  > > > >> >> >>>The tool can run for a week or
a month and produce these
> >>  kinds
> >>  > > of
> >>  > > > >> >> >>>statistics. Then once the statistics
are available we
> can
> >>  > come
> >>  > > up
> >>  > > > >> with
> >>  > > > >> >> a
> >>  > > > >> >> >>>guidance documentation of recommended
cluster setup.
> >>  > Otherwise
> >>  > > > it¹s
> >>  > > > >> >> hard
> >>  > > > >> >> >>>to properly size a cluster and
setup streaming
> parallelism
> >>  not
> >>  > > > >> knowing
> >>  > > > >> >> >>>these metrics.
> >>  > > > >> >> >>>
> >>  > > > >> >> >>>
> >>  > > > >> >> >>>Thoughts/ideas?
> >>  > > > >> >> >>>
> >>  > > > >> >> >>>Thanks,
> >>  > > > >> >> >>>James
> >>  > > > >> >> >>
> >>  > > > >> >> >>
> >>  > > > >> >> >
> >>  > > > >> >>
> >>  > > > >>
> >>  > > > >
> >>  > > > >
> >>  > > > >
> >>  > > > >--
> >>  > > > >Nick Allen <nick@nickallen.org>
> >>  > > >
> >>  > >
> >>  > >
> >>  > >
> >>  > > --
> >>  > > Nick Allen <nick@nickallen.org>
> >>  > >
> >>  > --
> >>  >
> >>  > Jon
> >>  >
> >>
> >>  --
> >>  Nick Allen <nick@nickallen.org>
> > --
> >
> > Jon
>
> -------------------
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>
-- 

Jon

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message