metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zeolla@GMail.com" <zeo...@gmail.com>
Subject Re: [DISCUSS] Error Indexing
Date Tue, 24 Jan 2017 19:02:24 GMT
Regarding error vs validation - Either way I'm not very concerned.  I
initially assumed they would be combined and agree with that approach, but
splitting them out isn't a very big deal to me either.

Re: Ryan.  Yes, exactly.  In the case of a parser issue (or anywhere else
where it's not possible to pick out the exact thing causing the issue) it
would be a hash of the complete message.

Regarding the architecture, I mostly agree with James except that I think
step 3 needs to also be able to somehow group errors via the original
data (identify
replays, identify repeat issues with data in a specific field, issues with
consistently different data, etc.).  This is essentially the first step of
troubleshooting, which I assume you are doing if you're looking at the
error dashboard.

If the hash gets moved out of the initial implementation, I'm fairly
certain you lose this ability.  The point here isn't to handle long fields
(although that's a benefit of this approach), it's to attach a unique
identifier to the error/validation issue message that links it to the
original problem.  I'd be happy to consider alternative solutions to this
problem (for instance, actually sending across the data itself) I just
haven't been able to think of another way to do this that I like better.

Jon

On Tue, Jan 24, 2017 at 1:13 PM Ryan Merriman <merrimanr@gmail.com> wrote:

> We also need a JIRA for any install/Ansible/MPack work needed.
>
> On Tue, Jan 24, 2017 at 12:06 PM, James Sirota <jsirota@apache.org> wrote:
>
> > Now that I had some time to think about it I would collapse all error and
> > validation topics into one.  We can differentiate between different views
> > of the data (split by error source etc) via Kibana dashboards.  I would
> > implement this feature incrementally.  First I would modify all the bolts
> > to log to a single topic.  Second, I would get the error indexing done by
> > attaching the indexing topology to the error topic. Third I would create
> > the necessary dashboards to view errors and validation failures by
> source.
> > Lastly, I would file a follow-on JIRA to introduce hashing of errors or
> > fields that are too long.  It seems like a separate feature that we need
> to
> > think through.  We may need a stellar function around that.
> >
> > Thanks,
> > James
> >
> > 24.01.2017, 10:25, "Ryan Merriman" <merrimanr@gmail.com>:
> > > I understand what Jon is talking about. He's proposing we hash the
> value
> > > that caused the error, not necessarily the error message itself. For an
> > > enrichment this is easy. Just pass along the field value that failed
> > > enrichment. For other cases the field that caused the error may not be
> so
> > > obvious. Take parser validation for example. The message is validated
> as
> > > a whole and it may not be easy to determine which field is the cause.
> In
> > > that case would a hash of the whole message work?
> > >
> > > There is a broader architectural discussion that needs to happen before
> > we
> > > can implement this. Currently we have an indexing topology that reads
> > from
> > > 1 topic and writes messages to ES but errors are written to several
> > > different topics:
> > >
> > >    - parser_error
> > >    - parser_invalid
> > >    - enrichments_error
> > >    - threatintel_error
> > >    - indexing_error
> > >
> > > I can see 4 possible approaches to implementing this:
> > >
> > >    1. Create an index topology for each error topic
> > >       1. Good because we can easily reuse the indexing topology and
> would
> > >       require the least development effort
> > >       2. Bad because it would consume a lot of extra worker slots
> > >    2. Move the topic name into the error JSON message as a new
> > "error_type"
> > >    field and write all messages to the indexing topic
> > >       1. Good because we don't need to create a new topology
> > >       2. Bad because we would be flowing data and errors through the
> same
> > >       topology. A spike in errors could affect message indexing.
> > >    3. Compromise between 1 and 2. Create another indexing topology that
> > is
> > >    dedicated to indexing errors. Move the topic name into the error
> JSON
> > >    message as a new "error_type" field and write all errors to a single
> > error
> > >    topic.
> > >    4. Write a completely new topology with multiple spouts (1 for each
> > >    error type listed above) that all feed into a single
> > BulkMessageWriterBolt.
> > >       1. Good because the current topologies would not need to change
> > >       2. Bad because it would require the most development effort,
> would
> > >       not reuse existing topologies and takes up more worker slots
> than 3
> > >
> > > Are there other approaches I haven't thought of? I think 1 and 2 are
> off
> > > the table because they are shortcuts and not good long-term solutions.
> 3
> > > would be my choice because it introduces less complexity than 4.
> > Thoughts?
> > >
> > > Ryan
> > >
> > > On Mon, Jan 23, 2017 at 5:44 PM, Zeolla@GMail.com <zeolla@gmail.com>
> > wrote:
> > >
> > >>  In that case the hash would be of the value in the IP field, such as
> > >>  sha3(8.8.8.8).
> > >>
> > >>  Jon
> > >>
> > >>  On Mon, Jan 23, 2017, 6:41 PM James Sirota <jsirota@apache.org>
> wrote:
> > >>
> > >>  > Jon,
> > >>  >
> > >>  > I am still not entirely following why we would want to use hashing.
> > For
> > >>  > example if my error is "Your IP field is invalid and failed
> > validation"
> > >>  > hashing this error string will always result in the same hash. Why
> > not
> > >>  > just use the actual error string? Can you provide an example where
> > you
> > >>  > would use it?
> > >>  >
> > >>  > Thanks,
> > >>  > James
> > >>  >
> > >>  > 23.01.2017, 16:29, "Zeolla@GMail.com" <zeolla@gmail.com>:
> > >>  > > For 1 - I'm good with that.
> > >>  > >
> > >>  > > I'm talking about hashing the relevant content itself not the
> > error.
> > >>  Some
> > >>  > > benefits are (1) minimize load on search index (there's minimal
> > benefit
> > >>  > in
> > >>  > > spending the CPU and disk to keep it at full fidelity (tokenize
> and
> > >>  > store))
> > >>  > > (2) provide something to key on for dashboards (assuming a good
> > hash
> > >>  > > algorithm that avoids collisions and is second preimage
> resistant)
> > and
> > >>  > (3)
> > >>  > > specific to errors, if the issue is that it failed to index,
a
> hash
> > >>  gives
> > >>  > > us some protection that the issue will not occur twice.
> > >>  > >
> > >>  > > Jon
> > >>  > >
> > >>  > > On Mon, Jan 23, 2017, 2:47 PM James Sirota <jsirota@apache.org>
> > wrote:
> > >>  > >
> > >>  > > Jon,
> > >>  > >
> > >>  > > With regards to 1, collapsing to a single dashboard for each
> would
> > be
> > >>  > > fine. So we would have one error index and one "failed to
> validate"
> > >>  > > index. The distinction is that errors would be things that went
> > wrong
> > >>  > > during stream processing (failed to parse, etc...), while
> > validation
> > >>  > > failures are messages that explicitly failed stellar
> > validation/schema
> > >>  > > enforcement. There should be relatively few of the second type.
> > >>  > >
> > >>  > > With respect to 3, why do you want the error hashed? Why not
just
> > >>  search
> > >>  > > for the error text?
> > >>  > >
> > >>  > > Thanks,
> > >>  > > James
> > >>  > >
> > >>  > > 20.01.2017, 14:01, "Zeolla@GMail.com" <zeolla@gmail.com>:
> > >>  > >> As someone who currently fills the platform engineer role,
I can
> > give
> > >>  > this
> > >>  > >> idea a huge +1. My thoughts:
> > >>  > >>
> > >>  > >> 1. I think it depends on exactly what data is pushed into
the
> > index
> > >>  > (#3).
> > >>  > >> However, assuming the errors you proposed recording, I can't
see
> > huge
> > >>  > >> benefits to having more than one dashboard. I would be happy
to
> be
> > >>  > >> persuaded otherwise.
> > >>  > >>
> > >>  > >> 2. I would say yes, storing the errors in HDFS in addition
to
> > >>  indexing
> > >>  > is
> > >>  > >> a good thing. Using METRON-510
> > >>  > >> <https://issues.apache.org/jira/browse/METRON-510>
as a case
> > study,
> > >>  > there
> > >>  > >> is the potential in this environment for attacker-controlled
> data
> > to
> > >>  > >
> > >>  > > result
> > >>  > >> in processing errors which could be a method of evading
security
> > >>  > >> monitoring. Once an attack is identified, the long term
HDFS
> > storage
> > >>  > would
> > >>  > >> allow better historical analysis for low-and-slow/persistent
> > attacks
> > >>  > (I'm
> > >>  > >> thinking of a method of data exfil that also won't successfully
> > get
> > >>  > stored
> > >>  > >> in Lucene, but is hard to identify over a short period of
time).
> > >>  > >> - Along this line, I think that there are various parts
of
> Metron
> > >>  > (this
> > >>  > >> included) which could benefit from having method of configuring
> > data
> > >>  > aging
> > >>  > >> by bucket in HDFS (Following Nick's comments here
> > >>  > >> <https://issues.apache.org/jira/browse/METRON-477>).
> > >>  > >>
> > >>  > >> 3. I would potentially add a hash of the content that failed
> > >>  > validation to
> > >>  > >> help identify repeats over time with less of a concern that
> you'd
> > >>  have
> > >>  > >
> > >>  > > back
> > >>  > >> to back failures (i.e. instead of storing the value itself).
> > >>  > Additionally,
> > >>  > >> I think it's helpful to be able to search all times there
was an
> > >>  > indexing
> > >>  > >> error (instead of it hitting the catch-all).
> > >>  > >>
> > >>  > >> Jon
> > >>  > >>
> > >>  > >> On Fri, Jan 20, 2017 at 1:17 PM James Sirota <
> jsirota@apache.org>
> > >>  > wrote:
> > >>  > >>
> > >>  > >> We already have a capability to capture bolt errors and
> validation
> > >>  > errors
> > >>  > >> and pipe them into a Kafka topic. I want to propose that
we
> > attach a
> > >>  > >> writer topology to the error and validation failed kafka
topics
> so
> > >>  > that we
> > >>  > >> can (a) create a new ES index for these errors and (b) create
a
> > new
> > >>  > Kibana
> > >>  > >> dashboard to visualize them. The benefit would be that errors
> and
> > >>  > >> validation failures would be easier to see and analyze.
> > >>  > >>
> > >>  > >> I am seeking feedback on the following:
> > >>  > >>
> > >>  > >> - How granular would we want this feature to be? Think we
would
> > want
> > >>  > one
> > >>  > >> index/dashboard per source? Or would it be better to collapse
> > >>  > everything
> > >>  > >> into the same index?
> > >>  > >> - Do we care about storing these errors in HDFS as well?
Or is
> > >>  indexing
> > >>  > >> them enough?
> > >>  > >> - What types of errors should we record? I am proposing:
> > >>  > >>
> > >>  > >> For error reporting:
> > >>  > >> --Message failed to parse
> > >>  > >> --Enrichment failed to enrich
> > >>  > >> --Threat intel feed failures
> > >>  > >> --Generic catch-all for all other errors
> > >>  > >>
> > >>  > >> For validation reporting:
> > >>  > >> --What part of message failed validation
> > >>  > >> --What stellar validator caused the failure
> > >>  > >>
> > >>  > >> -------------------
> > >>  > >> Thank you,
> > >>  > >>
> > >>  > >> James Sirota
> > >>  > >> PPMC- Apache Metron (Incubating)
> > >>  > >> jsirota AT apache DOT org
> > >>  > >>
> > >>  > >> --
> > >>  > >>
> > >>  > >> Jon
> > >>  > >>
> > >>  > >> Sent from my mobile device
> > >>  > >
> > >>  > > -------------------
> > >>  > > Thank you,
> > >>  > >
> > >>  > > James Sirota
> > >>  > > PPMC- Apache Metron (Incubating)
> > >>  > > jsirota AT apache DOT org
> > >>  > >
> > >>  > > --
> > >>  > >
> > >>  > > Jon
> > >>  > >
> > >>  > > Sent from my mobile device
> > >>  >
> > >>  > -------------------
> > >>  > Thank you,
> > >>  >
> > >>  > James Sirota
> > >>  > PPMC- Apache Metron (Incubating)
> > >>  > jsirota AT apache DOT org
> > >>  >
> > >>  --
> > >>
> > >>  Jon
> > >>
> > >>  Sent from my mobile device
> >
> > -------------------
> > Thank you,
> >
> > James Sirota
> > PPMC- Apache Metron (Incubating)
> > jsirota AT apache DOT org
> >
>
-- 

Jon

Sent from my mobile device

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message