metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zeolla@GMail.com" <zeo...@gmail.com>
Subject Re: [DISCUSS] Metron assessment tool
Date Fri, 15 Jul 2016 11:35:15 GMT
That's roughly what I was thinking as well.

Time slices:
- 1 minute (smallest unit - should catch the peaks and valleys of batch
jobs that send to Kafka like crons)
- 1 hour (a normal index size)
- 4 hours (half of a typical SOC shift)
- 24 hours (1 day)
- 168 hours (1 week)

Stats for 1min:
- Number of messages grouped by sensor.
- Size of messages grouped by sensor.

Stats for each time slice > 1min, with 1 min accuracy:
- Number of messages (min, 1st quartile, mean, median, 3rd quartile, max,
std dev) grouped by sensor.
- Size of messages (min, 1st quartile, mean, median, 3rd quartile, max, std
dev) grouped by sensor.

That should give us the typical Volume/Velocity/Variety information that
people like to see.  Anything that I'm missing which could be helpful?  I
don't have a huge amount of experience with scoping hadoop clusters in
general.

Some outstanding items which I think should be addressed as a future
enhancement:
1.  There needs to be a way to generally gauge the overhead of
enrichments.  Currently I'm not sure of the best way to do this, given the
current state of the project.
2.  This may not be easy enough to set up in a large environment due to
scale.  The fix would be to introduce stats based on sampling.

Thoughts?

Jon

On Tue, Jul 12, 2016, 17:53 James Sirota <jsirota@apache.org> wrote:

> Per the Apache Way it would be desirable to put forth an architecture
> proposal together to have the community take a look at it before
> implementing.  I would propose to have a simple storm topology that
> attaches to a kafka topic and records statistics such as # of messages and
> total throughput for n-sized time bins.  What specific requirements do you
> have in mind?
>
> Thanks,
> James
>
> 12.07.2016, 14:41, "Zeolla@GMail.com" <zeolla@gmail.com>:
> > I can definitely give it a shot. A kickstart would be appreciated.
> >
> > Jom
> >
> > On Tue, Jul 12, 2016, 17:17 James Sirota <jsirota@apache.org> wrote:
> >
> >>  John,
> >>
> >>  Just field METRON-318. Is this something you would like to work on?
> >>  Would you like help from us to get started?
> >>
> >>  Thanks,
> >>  James
> >>
> >>  12.07.2016, 11:53, "Zeolla@GMail.com" <zeolla@gmail.com>:
> >>  > Hi All,
> >>  >
> >>  > Has there been any additional discussion or development regarding
> this? I
> >>  > did take a brief look around the jira and didn't see anything
> regarding
> >>  > this, but I may have missed it. Thanks,
> >>  >
> >>  > Jon
> >>  >
> >>  > On Fri, Apr 15, 2016 at 2:01 PM Nick Allen <nick@nickallen.org>
> wrote:
> >>  >
> >>  >> I definitely agree that you need this level of understanding of your
> >>  >> cluster. It definitely could work the way that you describe.
> >>  >>
> >>  >> I was thinking of it slightly differently though. The metrics for
> this
> >>  >> purpose (understanding performance of existing cluster) should come
> >>  from
> >>  >> the actual sensors themselves. For example, I need to instrument the
> >>  >> packet capture process so that it kicks out time-series-ish metrics
> >>  that
> >>  >> you can monitor in a dashboard over time.
> >>  >>
> >>  >> On Fri, Apr 15, 2016 at 1:40 PM, Zeolla@GMail.com <zeolla@gmail.com
> >
> >>  >> wrote:
> >>  >>
> >>  >> > However, it would be handy to have something like this perpetually
> >>  >> running
> >>  >> > so you know when to scale up/out/down/in a cluster.
> >>  >> >
> >>  >> > On Fri, Apr 15, 2016, 13:35 Nick Allen <nick@nickallen.org>
> wrote:
> >>  >> >
> >>  >> > > I think it is slightly different. I don't even want to install
> >>  minimal
> >>  >> > > Kafka infrastructure (Look ma, no Kafka!)
> >>  >> > >
> >>  >> > > The exact implementation would differ based on the data
inputs
> >>  that you
> >>  >> > are
> >>  >> > > trying to measure, but for example...
> >>  >> > >
> >>  >> > > - To understand raw packet rates I would have a specialized
> sensor
> >>  >> > that
> >>  >> > > counts packets and size on the wire. It doesn't do anything
more
> >>  >> than
> >>  >> > > that.
> >>  >> > > - To understand Netflow rates, it would watch for Netflow
> packets
> >>  >> and
> >>  >> > > count those.
> >>  >> > > - To understand sizing around application logs, a sensor
would
> >>  watch
> >>  >> > for
> >>  >> > > Syslog packets and count those.
> >>  >> > >
> >>  >> > > The implementation would be more similar to raw packet capture
> with
> >>  >> some
> >>  >> > > DPI. No Hadoop-y components required.
> >>  >> > >
> >>  >> > >
> >>  >> > >
> >>  >> > > On Fri, Apr 15, 2016 at 1:10 PM, James Sirota <
> >>  jsirota@hortonworks.com
> >>  >> >
> >>  >> > > wrote:
> >>  >> > >
> >>  >> > > > So this is exactly what I am proposing. Calculate the
metrics
> on
> >>  the
> >>  >> > fly
> >>  >> > > > without landing any data in the cluster. The problem
is that
> that
> >>  >> > > > enterprise data volumes are so large you can’t just
point them
> >>  at a
> >>  >> > Java
> >>  >> > > or
> >>  >> > > > a C++ program or sensor. You either need an existing
minimal
> >>  Kafka
> >>  >> > > > infrastructure to take that load or sample the data.
> >>  >> > > >
> >>  >> > > > Thanks,
> >>  >> > > > James
> >>  >> > > >
> >>  >> > > >
> >>  >> > > >
> >>  >> > > >
> >>  >> > > > On 4/15/16, 9:54 AM, "Nick Allen" <nick@nickallen.org>
wrote:
> >>  >> > > >
> >>  >> > > > >Or we have the assessment tool not actually land
any data.
> The
> >>  >> > > assessment
> >>  >> > > > >tool becomes a 'sensor' in its own right. You just
point the
> >>  input
> >>  >> > data
> >>  >> > > > >sets at the assessment tool, it builds metrics
on the input
> (for
> >>  >> > > example:
> >>  >> > > > >count the number of packets per second) and then
we use those
> >>  >> metrics
> >>  >> > to
> >>  >> > > > >estimate cluster size.
> >>  >> > > > >
> >>  >> > > > >On Wed, Apr 13, 2016 at 5:45 PM, James Sirota <
> >>  >> > jsirota@hortonworks.com>
> >>  >> > > > >wrote:
> >>  >> > > > >
> >>  >> > > > >> That’s an excellent point. So I think there
are three ways
> >>  >> forward.
> >>  >> > > > >>
> >>  >> > > > >> One is we can assume that there has to be
at least a
> minimal
> >>  >> > > > >> infrastructure in place (at least a subset
of Kafka and
> Storm
> >>  >> > > > resources) to
> >>  >> > > > >> run a full-scale assessment. If you point
something that
> >>  blasts
> >>  >> > > > millions
> >>  >> > > > >> of messages per second at something like ActiveMQ
you are
> >>  going to
> >>  >> > > blow
> >>  >> > > > >> up. So the infrastructure to at least receive
these kinds
> of
> >>  >> > message
> >>  >> > > > >> volumes has to exist as a pre-requisite. There
is no way to
> >>  get
> >>  >> > around
> >>  >> > > > that.
> >>  >> > > > >>
> >>  >> > > > >> The second approach I see is sampling. Sampling
is a lot
> less
> >>  >> > precise
> >>  >> > > > and
> >>  >> > > > >> you can miss peaks that fall outside of your
sampling
> windows.
> >>  >> But
> >>  >> > > the
> >>  >> > > > >> obvious benefit is that you don’t need a
cluster to process
> >>  these
> >>  >> > > > streams.
> >>  >> > > > >> You can probably perform most of your calculations
with a
> >>  >> > > multithreaded
> >>  >> > > > >> java program. Sampling poses a few design
challenges.
> First,
> >>  >> where
> >>  >> > > do
> >>  >> > > > you
> >>  >> > > > >> sample? Do you sample on the sensor? (the
implication here
> is
> >>  >> that
> >>  >> > we
> >>  >> > > > have
> >>  >> > > > >> to program some sort of sampling capability
in our
> sensors) .
> >>  Do
> >>  >> you
> >>  >> > > > sample
> >>  >> > > > >> on transport? (maybe a Flume interceptor or
NiFi
> processor).
> >>  >> There
> >>  >> > is
> >>  >> > > > also
> >>  >> > > > >> a question of what the sampling rate should
be. Not knowing
> >>  >> > > statistical
> >>  >> > > > >> properties of a stream ahead of time it’s
hard to make that
> >>  call.
> >>  >> > > > >>
> >>  >> > > > >> The third option I think is MR job. We can
blast the data
> into
> >>  >> HDFS
> >>  >> > > and
> >>  >> > > > >> then go over it with MR to derive the metrics
we are
> looking
> >>  for.
> >>  >> > > Then
> >>  >> > > > we
> >>  >> > > > >> don’t have to sample or setup expensive
infrastructure to
> >>  receive
> >>  >> a
> >>  >> > > > deluge
> >>  >> > > > >> of data. But then we run into the chicken
and the egg
> problem
> >>  >> that
> >>  >> > in
> >>  >> > > > >> order to size your HDFS you need to have data
in HDFS.
> Ideally
> >>  >> you
> >>  >> > > > need to
> >>  >> > > > >> capture at least one full weeks worth of logs
because
> patterns
> >>  >> > > > throughout
> >>  >> > > > >> the day as well as every day of the week have
different
> >>  >> statistical
> >>  >> > > > >> properties. So you need off peak, on peak,
weekdays and
> >>  weekends
> >>  >> to
> >>  >> > > > derive
> >>  >> > > > >> these stats in batch.
> >>  >> > > > >>
> >>  >> > > > >> Any other design ideas?
> >>  >> > > > >>
> >>  >> > > > >> Thanks,
> >>  >> > > > >> James
> >>  >> > > > >>
> >>  >> > > > >>
> >>  >> > > > >>
> >>  >> > > > >>
> >>  >> > > > >>
> >>  >> > > > >> On 4/13/16, 1:59 PM, "Nick Allen" <nick@nickallen.org>
> wrote:
> >>  >> > > > >>
> >>  >> > > > >> >If the tool starts at Kafka, the user
would have to
> already
> >>  have
> >>  >> > > > committed
> >>  >> > > > >> >to the investment in the infrastructure
and time to setup
> the
> >>  >> > sensors
> >>  >> > > > that
> >>  >> > > > >> >feed Kafka and Kafka itself. Maybe it
would need to be
> >>  further
> >>  >> > > > upstream?
> >>  >> > > > >> >On Apr 13, 2016 1:05 PM, "James Sirota"
<
> >>  jsirota@hortonworks.com
> >>  >> >
> >>  >> > > > wrote:
> >>  >> > > > >> >
> >>  >> > > > >> >> Hi Goerge,
> >>  >> > > > >> >>
> >>  >> > > > >> >> This article defines micro-tuning
of the existing
> cluster.
> >>  >> What
> >>  >> > I
> >>  >> > > am
> >>  >> > > > >> >> proposing is a level up from that.
When you start with
> >>  Metron
> >>  >> > how
> >>  >> > > do
> >>  >> > > > >> you
> >>  >> > > > >> >> even know how many nodes you need?
And of these nodes
> how
> >>  many
> >>  >> > do
> >>  >> > > > you
> >>  >> > > > >> >> allocate to Storm, indexing, storage?
How much storage
> do
> >>  you
> >>  >> > > need?
> >>  >> > > > >> >> Tuning would be the next step in
the process, but this
> tool
> >>  >> would
> >>  >> > > > answer
> >>  >> > > > >> >> more fundamental questions about
what a Metron
> deployment
> >>  >> should
> >>  >> > > look
> >>  >> > > > >> like
> >>  >> > > > >> >> given the number of telemetries and
retention policies
> of
> >>  the
> >>  >> > > > >> enterprise.
> >>  >> > > > >> >>
> >>  >> > > > >> >> The best way to get this data (in
my opinion) is to have
> >>  some
> >>  >> > tool
> >>  >> > > > that
> >>  >> > > > >> we
> >>  >> > > > >> >> can plug into Metron’s point of
ingest (kafka topics)
> and
> >>  run
> >>  >> > that
> >>  >> > > > for
> >>  >> > > > >> >> about a week or a month to be able
to figure that out
> and
> >>  spit
> >>  >> > out
> >>  >> > > > these
> >>  >> > > > >> >> relevant metrics. Based on these
metrics we can figure
> out
> >>  the
> >>  >> > > > >> fundamental
> >>  >> > > > >> >> things about what metron should look
like. Tuning would
> be
> >>  the
> >>  >> > > next
> >>  >> > > > >> step.
> >>  >> > > > >> >>
> >>  >> > > > >> >> Thanks,
> >>  >> > > > >> >> James
> >>  >> > > > >> >>
> >>  >> > > > >> >>
> >>  >> > > > >> >>
> >>  >> > > > >> >>
> >>  >> > > > >> >> On 4/13/16, 9:52 AM, "George Vetticaden"
<
> >>  >> > > > gvetticaden@hortonworks.com>
> >>  >> > > > >> >> wrote:
> >>  >> > > > >> >>
> >>  >> > > > >> >> >I have used the following Kafka
and Storm Best
> Practices
> >>  guide
> >>  >> > at
> >>  >> > > > >> numerous
> >>  >> > > > >> >> >customer implementations.
> >>  >> > > > >> >> >
> >>  >> > > > >> >>
> >>  >> > > > >>
> >>  >> > > >
> >>  >> > >
> >>  >> >
> >>  >>
> >>
> https://community.hortonworks.com/articles/550/unofficial-storm-and-kafka-b
> >>  >> > > > >> >> >est-practices-guide.html
> >>  >> > > > >> >> >
> >>  >> > > > >> >> >
> >>  >> > > > >> >> >We need to have something similar
and prescriptive for
> >>  Metron
> >>  >> > > based
> >>  >> > > > on:
> >>  >> > > > >> >> >1. What data sources are we enabling
> >>  >> > > > >> >> >2. What enrichment services are
we enabling
> >>  >> > > > >> >> >3. What threat intel services
are we enabling
> >>  >> > > > >> >> >4. What are we indexing into
Solr/Elastic and how long
> >>  >> > > > >> >> >5. What are we persisting into
HDFS..
> >>  >> > > > >> >> >
> >>  >> > > > >> >> >Ideally, the The metron assessment
tool combined with
> an
> >>  >> > > > introspection
> >>  >> > > > >> of
> >>  >> > > > >> >> >the user’s ansible configuration
should drive what
> ambari
> >>  >> > > blueprint
> >>  >> > > > >> type
> >>  >> > > > >> >> >and configuration should be used
when the cluster is
> spun
> >>  up
> >>  >> and
> >>  >> > > the
> >>  >> > > > >> storm
> >>  >> > > > >> >> >topology is deployed.
> >>  >> > > > >> >> >
> >>  >> > > > >> >> >
> >>  >> > > > >> >> >--
> >>  >> > > > >> >> >George VetticadenPrincipal, COE
> >>  >> > > > >> >> >gvetticaden@hortonworks.com
> >>  >> > > > >> >> >(630) 909-9138
> >>  >> > > > >> >> >
> >>  >> > > > >> >> >
> >>  >> > > > >> >> >
> >>  >> > > > >> >> >
> >>  >> > > > >> >> >
> >>  >> > > > >> >> >On 4/13/16, 11:40 AM, "George
Vetticaden" <
> >>  >> > > > gvetticaden@hortonworks.com
> >>  >> > > > >> >
> >>  >> > > > >> >> >wrote:
> >>  >> > > > >> >> >
> >>  >> > > > >> >> >>+ 1 to James suggestion.
> >>  >> > > > >> >> >>We also need to consider
not just the data volume and
> >>  storage
> >>  >> > > > >> >> requirements
> >>  >> > > > >> >> >>for proper cluster sizing
but also processing
> >>  requirements as
> >>  >> > > well.
> >>  >> > > > >> Given
> >>  >> > > > >> >> >>that in the new architecture,
we have moved to single
> >>  >> > enrichment
> >>  >> > > > >> topology
> >>  >> > > > >> >> >>that will support all data
sources, proper sizing of
> the
> >>  >> > > enrichment
> >>  >> > > > >> >> >>topology will be even more
crucial to maintain SLAs
> and
> >>  HA
> >>  >> > > > >> requirements.
> >>  >> > > > >> >> >>The following key questions
will apply to each parser
> >>  >> topology
> >>  >> > > and
> >>  >> > > > >> single
> >>  >> > > > >> >> >>enrichment topology
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>1. Number of workers?
> >>  >> > > > >> >> >>2. Number of workers per
machine?
> >>  >> > > > >> >> >>3. Size of each workers (in
memory)?
> >>  >> > > > >> >> >>4. Supervisor memory settings
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>The assessment tool should
also be used to size
> >>  topologies
> >>  >> > > > correctly
> >>  >> > > > >> as
> >>  >> > > > >> >> >>well.
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>Tuning Kafka, Hbase and Solr/Elastic
should also be
> >>  governed
> >>  >> by
> >>  >> > > the
> >>  >> > > > >> >> Metron
> >>  >> > > > >> >> >>assessment tool.
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>--
> >>  >> > > > >> >> >>George Vetticaden
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>On 4/13/16, 11:28 AM, "James
Sirota" <
> >>  >> jsirota@hortonworks.com>
> >>  >> > > > wrote:
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>>Prior to adoption of
Metron each adopting entity
> needs
> >>  to
> >>  >> > > > guesstimate
> >>  >> > > > >> >> >>>it¹s data volume and
data storage requirements so
> they
> >>  can
> >>  >> > size
> >>  >> > > > their
> >>  >> > > > >> >> >>>cluster properly. I propose
a creation of an
> assessment
> >>  >> tool
> >>  >> > > that
> >>  >> > > > >> can
> >>  >> > > > >> >> >>>plug in to a Kafka topic
for a given telemetry and
> over
> >>  time
> >>  >> > > > produce
> >>  >> > > > >> >> >>>statistics for ingest
volumes and storage
> requirement.
> >>  The
> >>  >> > idea
> >>  >> > > > is
> >>  >> > > > >> that
> >>  >> > > > >> >> >>>prior to adoption of
Metron someone can set up all
> the
> >>  feeds
> >>  >> > and
> >>  >> > > > >> kafka
> >>  >> > > > >> >> >>>topics, but instead of
deploying Metron right away
> they
> >>  >> would
> >>  >> > > > deploy
> >>  >> > > > >> >> this
> >>  >> > > > >> >> >>>tool. This tool would
then produce statistics for
> data
> >>  >> > > > >> ingest/storage
> >>  >> > > > >> >> >>>requirement, and all
relevant information needed for
> >>  cluster
> >>  >> > > > sizing.
> >>  >> > > > >> >> >>>
> >>  >> > > > >> >> >>>Some of the metrics that
can be recorded are:
> >>  >> > > > >> >> >>>
> >>  >> > > > >> >> >>> * Number of system events
per second (average, max,
> >>  >> mean,
> >>  >> > > > >> standard
> >>  >> > > > >> >> >>>dev)
> >>  >> > > > >> >> >>> * Message size (average,
max, mean, standard dev)
> >>  >> > > > >> >> >>> * Average number of
peaks
> >>  >> > > > >> >> >>> * Duration of peaks
(average, max, mean, standard
> dev)
> >>  >> > > > >> >> >>>
> >>  >> > > > >> >> >>>If the parser for a telemetry
exist the tool can
> produce
> >>  >> > > > additional
> >>  >> > > > >> >> >>>statistics
> >>  >> > > > >> >> >>>
> >>  >> > > > >> >> >>> * Number of keys/fields
parsed (average, max, mean,
> >>  >> > standard
> >>  >> > > > dev)
> >>  >> > > > >> >> >>> * Length of field parsed
(average, max, mean,
> standard
> >>  >> > dev)
> >>  >> > > > >> >> >>> * Length of key parsed
(average, max, mean, standard
> >>  >> dev)
> >>  >> > > > >> >> >>>
> >>  >> > > > >> >> >>>The tool can run for
a week or a month and produce
> these
> >>  >> kinds
> >>  >> > > of
> >>  >> > > > >> >> >>>statistics. Then once
the statistics are available we
> >>  can
> >>  >> > come
> >>  >> > > up
> >>  >> > > > >> with
> >>  >> > > > >> >> a
> >>  >> > > > >> >> >>>guidance documentation
of recommended cluster setup.
> >>  >> > Otherwise
> >>  >> > > > it¹s
> >>  >> > > > >> >> hard
> >>  >> > > > >> >> >>>to properly size a cluster
and setup streaming
> >>  parallelism
> >>  >> not
> >>  >> > > > >> knowing
> >>  >> > > > >> >> >>>these metrics.
> >>  >> > > > >> >> >>>
> >>  >> > > > >> >> >>>
> >>  >> > > > >> >> >>>Thoughts/ideas?
> >>  >> > > > >> >> >>>
> >>  >> > > > >> >> >>>Thanks,
> >>  >> > > > >> >> >>>James
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >>
> >>  >> > > > >> >> >
> >>  >> > > > >> >>
> >>  >> > > > >>
> >>  >> > > > >
> >>  >> > > > >
> >>  >> > > > >
> >>  >> > > > >--
> >>  >> > > > >Nick Allen <nick@nickallen.org>
> >>  >> > > >
> >>  >> > >
> >>  >> > >
> >>  >> > >
> >>  >> > > --
> >>  >> > > Nick Allen <nick@nickallen.org>
> >>  >> > >
> >>  >> > --
> >>  >> >
> >>  >> > Jon
> >>  >> >
> >>  >>
> >>  >> --
> >>  >> Nick Allen <nick@nickallen.org>
> >>  > --
> >>  >
> >>  > Jon
> >>
> >>  -------------------
> >>  Thank you,
> >>
> >>  James Sirota
> >>  PPMC- Apache Metron (Incubating)
> >>  jsirota AT apache DOT org
> > --
> >
> > Jon
>
> -------------------
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>
-- 

Jon

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message