metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Sirota <jsir...@apache.org>
Subject Re: [DISCUSS] Error Indexing
Date Tue, 24 Jan 2017 18:06:29 GMT
Now that I had some time to think about it I would collapse all error and validation topics
into one.  We can differentiate between different views of the data (split by error source
etc) via Kibana dashboards.  I would implement this feature incrementally.  First I would
modify all the bolts to log to a single topic.  Second, I would get the error indexing done
by attaching the indexing topology to the error topic. Third I would create the necessary
dashboards to view errors and validation failures by source.  Lastly, I would file a follow-on
JIRA to introduce hashing of errors or fields that are too long.  It seems like a separate
feature that we need to think through.  We may need a stellar function around that.

Thanks,
James 

24.01.2017, 10:25, "Ryan Merriman" <merrimanr@gmail.com>:
> I understand what Jon is talking about. He's proposing we hash the value
> that caused the error, not necessarily the error message itself. For an
> enrichment this is easy. Just pass along the field value that failed
> enrichment. For other cases the field that caused the error may not be so
> obvious. Take parser validation for example. The message is validated as
> a whole and it may not be easy to determine which field is the cause. In
> that case would a hash of the whole message work?
>
> There is a broader architectural discussion that needs to happen before we
> can implement this. Currently we have an indexing topology that reads from
> 1 topic and writes messages to ES but errors are written to several
> different topics:
>
>    - parser_error
>    - parser_invalid
>    - enrichments_error
>    - threatintel_error
>    - indexing_error
>
> I can see 4 possible approaches to implementing this:
>
>    1. Create an index topology for each error topic
>       1. Good because we can easily reuse the indexing topology and would
>       require the least development effort
>       2. Bad because it would consume a lot of extra worker slots
>    2. Move the topic name into the error JSON message as a new "error_type"
>    field and write all messages to the indexing topic
>       1. Good because we don't need to create a new topology
>       2. Bad because we would be flowing data and errors through the same
>       topology. A spike in errors could affect message indexing.
>    3. Compromise between 1 and 2. Create another indexing topology that is
>    dedicated to indexing errors. Move the topic name into the error JSON
>    message as a new "error_type" field and write all errors to a single error
>    topic.
>    4. Write a completely new topology with multiple spouts (1 for each
>    error type listed above) that all feed into a single BulkMessageWriterBolt.
>       1. Good because the current topologies would not need to change
>       2. Bad because it would require the most development effort, would
>       not reuse existing topologies and takes up more worker slots than 3
>
> Are there other approaches I haven't thought of? I think 1 and 2 are off
> the table because they are shortcuts and not good long-term solutions. 3
> would be my choice because it introduces less complexity than 4. Thoughts?
>
> Ryan
>
> On Mon, Jan 23, 2017 at 5:44 PM, Zeolla@GMail.com <zeolla@gmail.com> wrote:
>
>>  In that case the hash would be of the value in the IP field, such as
>>  sha3(8.8.8.8).
>>
>>  Jon
>>
>>  On Mon, Jan 23, 2017, 6:41 PM James Sirota <jsirota@apache.org> wrote:
>>
>>  > Jon,
>>  >
>>  > I am still not entirely following why we would want to use hashing. For
>>  > example if my error is "Your IP field is invalid and failed validation"
>>  > hashing this error string will always result in the same hash. Why not
>>  > just use the actual error string? Can you provide an example where you
>>  > would use it?
>>  >
>>  > Thanks,
>>  > James
>>  >
>>  > 23.01.2017, 16:29, "Zeolla@GMail.com" <zeolla@gmail.com>:
>>  > > For 1 - I'm good with that.
>>  > >
>>  > > I'm talking about hashing the relevant content itself not the error.
>>  Some
>>  > > benefits are (1) minimize load on search index (there's minimal benefit
>>  > in
>>  > > spending the CPU and disk to keep it at full fidelity (tokenize and
>>  > store))
>>  > > (2) provide something to key on for dashboards (assuming a good hash
>>  > > algorithm that avoids collisions and is second preimage resistant) and
>>  > (3)
>>  > > specific to errors, if the issue is that it failed to index, a hash
>>  gives
>>  > > us some protection that the issue will not occur twice.
>>  > >
>>  > > Jon
>>  > >
>>  > > On Mon, Jan 23, 2017, 2:47 PM James Sirota <jsirota@apache.org>
wrote:
>>  > >
>>  > > Jon,
>>  > >
>>  > > With regards to 1, collapsing to a single dashboard for each would be
>>  > > fine. So we would have one error index and one "failed to validate"
>>  > > index. The distinction is that errors would be things that went wrong
>>  > > during stream processing (failed to parse, etc...), while validation
>>  > > failures are messages that explicitly failed stellar validation/schema
>>  > > enforcement. There should be relatively few of the second type.
>>  > >
>>  > > With respect to 3, why do you want the error hashed? Why not just
>>  search
>>  > > for the error text?
>>  > >
>>  > > Thanks,
>>  > > James
>>  > >
>>  > > 20.01.2017, 14:01, "Zeolla@GMail.com" <zeolla@gmail.com>:
>>  > >> As someone who currently fills the platform engineer role, I can
give
>>  > this
>>  > >> idea a huge +1. My thoughts:
>>  > >>
>>  > >> 1. I think it depends on exactly what data is pushed into the index
>>  > (#3).
>>  > >> However, assuming the errors you proposed recording, I can't see
huge
>>  > >> benefits to having more than one dashboard. I would be happy to be
>>  > >> persuaded otherwise.
>>  > >>
>>  > >> 2. I would say yes, storing the errors in HDFS in addition to
>>  indexing
>>  > is
>>  > >> a good thing. Using METRON-510
>>  > >> <https://issues.apache.org/jira/browse/METRON-510> as a case
study,
>>  > there
>>  > >> is the potential in this environment for attacker-controlled data
to
>>  > >
>>  > > result
>>  > >> in processing errors which could be a method of evading security
>>  > >> monitoring. Once an attack is identified, the long term HDFS storage
>>  > would
>>  > >> allow better historical analysis for low-and-slow/persistent attacks
>>  > (I'm
>>  > >> thinking of a method of data exfil that also won't successfully get
>>  > stored
>>  > >> in Lucene, but is hard to identify over a short period of time).
>>  > >> - Along this line, I think that there are various parts of Metron
>>  > (this
>>  > >> included) which could benefit from having method of configuring data
>>  > aging
>>  > >> by bucket in HDFS (Following Nick's comments here
>>  > >> <https://issues.apache.org/jira/browse/METRON-477>).
>>  > >>
>>  > >> 3. I would potentially add a hash of the content that failed
>>  > validation to
>>  > >> help identify repeats over time with less of a concern that you'd
>>  have
>>  > >
>>  > > back
>>  > >> to back failures (i.e. instead of storing the value itself).
>>  > Additionally,
>>  > >> I think it's helpful to be able to search all times there was an
>>  > indexing
>>  > >> error (instead of it hitting the catch-all).
>>  > >>
>>  > >> Jon
>>  > >>
>>  > >> On Fri, Jan 20, 2017 at 1:17 PM James Sirota <jsirota@apache.org>
>>  > wrote:
>>  > >>
>>  > >> We already have a capability to capture bolt errors and validation
>>  > errors
>>  > >> and pipe them into a Kafka topic. I want to propose that we attach
a
>>  > >> writer topology to the error and validation failed kafka topics so
>>  > that we
>>  > >> can (a) create a new ES index for these errors and (b) create a new
>>  > Kibana
>>  > >> dashboard to visualize them. The benefit would be that errors and
>>  > >> validation failures would be easier to see and analyze.
>>  > >>
>>  > >> I am seeking feedback on the following:
>>  > >>
>>  > >> - How granular would we want this feature to be? Think we would want
>>  > one
>>  > >> index/dashboard per source? Or would it be better to collapse
>>  > everything
>>  > >> into the same index?
>>  > >> - Do we care about storing these errors in HDFS as well? Or is
>>  indexing
>>  > >> them enough?
>>  > >> - What types of errors should we record? I am proposing:
>>  > >>
>>  > >> For error reporting:
>>  > >> --Message failed to parse
>>  > >> --Enrichment failed to enrich
>>  > >> --Threat intel feed failures
>>  > >> --Generic catch-all for all other errors
>>  > >>
>>  > >> For validation reporting:
>>  > >> --What part of message failed validation
>>  > >> --What stellar validator caused the failure
>>  > >>
>>  > >> -------------------
>>  > >> Thank you,
>>  > >>
>>  > >> James Sirota
>>  > >> PPMC- Apache Metron (Incubating)
>>  > >> jsirota AT apache DOT org
>>  > >>
>>  > >> --
>>  > >>
>>  > >> Jon
>>  > >>
>>  > >> Sent from my mobile device
>>  > >
>>  > > -------------------
>>  > > Thank you,
>>  > >
>>  > > James Sirota
>>  > > PPMC- Apache Metron (Incubating)
>>  > > jsirota AT apache DOT org
>>  > >
>>  > > --
>>  > >
>>  > > Jon
>>  > >
>>  > > Sent from my mobile device
>>  >
>>  > -------------------
>>  > Thank you,
>>  >
>>  > James Sirota
>>  > PPMC- Apache Metron (Incubating)
>>  > jsirota AT apache DOT org
>>  >
>>  --
>>
>>  Jon
>>
>>  Sent from my mobile device

------------------- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Mime
View raw message