metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zeolla@GMail.com" <zeo...@gmail.com>
Subject Re: [DISCUSS] Unique id for messages
Date Fri, 10 Mar 2017 15:51:17 GMT
I definitely think that this is a valuable discussion.  I seem to recall
cstella mentioning at some point in the past that there is a UUID already
used in storm that we might be able to expose into the message itself, but
I could be wrong.

For additional context regarding prior discussions, this was also briefly
discussed in another topic here here
<https://lists.apache.org/thread.html/b039f0f0a5e6cfaf30944dc768088e1e1bd5dae4b2247dda12698805@%3Cdev.metron.apache.org%3E>.
In that context I was hoping to be able to link messages across all
indexing destinations (HDFS, ES, Solr, etc.).

On Fri, Mar 10, 2017 at 9:26 AM Raghu Mitra Kandikonda <rksv@hortonworks.com>
wrote:

> Hi All,
>
> I would like to start a discussion around adding a unique id to all the
> parsed messages.  I feel there  was  a discussion around a similar topic
> but I am not sure as a community we agreed on a proposal.
>
> We could
> -use a random number generator like UUID but this might have performance
> implications
> -use a kafka topic name + systemtime + Kafka message offset to generate a
> unique identifier
> -use the input message to generate a hashcode
>
> Any thoughts ?
>
> (Attached email that had similar discussion for error indexing)
>
> Regards,
> RaghuM
>
>
>
> ---------- Forwarded message ----------
> From: "Zeolla@GMail.com" <zeolla@gmail.com>
> To: "dev@metron.incubator.apache.org" <dev@metron.incubator.apache.org>
> Cc:
> Bcc:
> Date: Wed, 1 Feb 2017 22:18:12 +0000
> Subject: Re: [DISCUSS] Error Indexing
> Simply as a unique identifier of the original information which is failing
> some step, and thus giving you something to key in on and create a count of
> unique events and prioritize issues without the concern of cyclical issues
> (if the issue is with indexing a specific message, and you try to index it
> again, it will just fail in a loop).
>
> Jon
>
> On Wed, Feb 1, 2017 at 6:59 AM Dima Kovalyov <Dima.Kovalyov@sstech.us>
> wrote:
>
> > That's a great topic of discussion.
> >
> > Throughout the thread the idea of having hash of the message that failed
> > is changed, can someone please explain why do you plan to use this hash
> > and how?
> >
> > - Dima
> >
> > On 02/01/2017 06:23 AM, Zeolla@GMail.com wrote:
> > > After thinking on this for a few days I recant my previous suggestion
> of
> > > TupleHash256.  It's still a bit early for SHA-3 - no good reference
> > > implementations/libraries exist (I did some searching and emailing), it
> > is
> > > optimized for hardware but no hardware implementation is widely
> > accessible,
> > > FIPS 140-3 is still not close to finalized, etc.
> > >
> > > I think we could simulate the benefits of tuplehash by sorting the
> > tuples,
> > > then doing SHA-256(len(tuple1) | tuple1 | ... | len(tuplen) | tuplen).
> > > Happy to entertain opposing thoughts, such as BLAKE2, etc. but with the
> > > likely users of Metron, I think sticking with FIPS 140-2 is a solid
> > choice.
> > >
> > > Jon
> > >
> > > On Thu, Jan 26, 2017, 11:23 AM Zeolla@GMail.com <zeolla@gmail.com>
> > wrote:
> > >
> > > So one more thing regarding why I think we should throw an exception
> on a
> > > failed enrichment.  If we do make something like username a constant
> > field,
> > > in cases where that is used to calculate rawMessage_hash, if it fails
> to
> > > enrich, the hash would be different compared to when it succeeds.  Of
> > > course I think the initial intent of adding username as a constant
> field
> > > would be to handle it in the parsers, where that information is
> provided
> > in
> > > the messages themselves, but how would Threat Intel know the
> difference?
> > > In my environment I am looking forward to a streaming enrichment that
> > adds
> > > the username, where applicable, anywhere I have an IP.
> > >
> > > My hesitant suggestion for a hashing algorithm would be to use
> > > TupleHash256, as it is a NIST-provided implementation of SHA-3 (using
> > > cSHAKE) for this use case.  Details here
> > > <
> > http://nvlpubs.nist.gov/nistpubs/specialpublications/nist.sp.800-185.pdf
> >.
> > > However, I haven't been able to find a reference implementation of this
> > in
> > > any language, so that's a bit of a downside.  A more general SHA3-256
> > > implementation where we handle ordering could work as well, but would
> be
> > > significantly less optimal.
> > >
> > > Jon
> > >
> > > On Thu, Jan 26, 2017 at 10:20 AM Ryan Merriman <merrimanr@gmail.com>
> > wrote:
> > >
> > > Jon, I misread the code in the GenericEnrichmentBolt.  The error is
> > > forwarded on so no issues there.
> > >
> > > Defaulting to the common fields makes sense.  I will dig into the
> > > GenericEnrichmentBolt more, maybe there is a way to get the error
> fields
> > > without having to significantly change things.  Any opinion on a
> hashing
> > > algorithm?
> > >
> > > On Wed, Jan 25, 2017 at 9:37 PM, Zeolla@GMail.com <zeolla@gmail.com>
> > wrote:
> > >
> > >> Although hashing the whole message is better than nothing, it misses a
> > lot
> > >> of the benefits we could get.
> > >>
> > >> While I'd love to have consistency for this field across all of the
> > >> different error.types, it appears that may not be reasonably possible
> > >> because of the parsers.  So, how about something like hash all of the
> > >> constant
> > >> fields
> > >> <https://github.com/apache/incubator-metron/blob/master/
> > >> metron-platform/metron-common/src/main/java/org/apache/
> > >> metron/common/Constants.java>
> > >> excluding
> > >> timestamp and original_string unless it is a parser, in which case
> hash
> > > the
> > >> entire message?  This gives us some measure of event uniqueness and it
> > can
> > >> grow as we define additional constant fields (I recall discussing with
> > >> someone else on the list regarding expanding those standard fields to
> > >> include things like usernames but I can't find the specific email
> > >> exchange).
> > >>
> > >> Because some enrichments can be heavily relied on, I think it makes
> > sense
> > >> to put a message onto the error queue when it throws an exception.
> Not
> > >> only does this help troubleshoot edge cases, but it makes issues more
> > >> obvious when assembling a new enrichment in dev/test.  I can't think
> of
> > a
> > >> scenario currently where an enrichment would only be "best effort" and
> > > that
> > >> I wouldn't want that error indexed and retrievable.  However, this
> gets
> > >> interesting when talking about the various options to solve the
> "Enrich
> > >> enrichment" discussion from earlier in the month.  We can keep that
> part
> > > of
> > >> this separate though, as I don't think that's being actively pursued
> > right
> > >> now.
> > >>
> > >> Jon
> > >>
> > >> On Wed, Jan 25, 2017 at 10:49 AM David Lyle <dlyle65535@gmail.com>
> > wrote:
> > >>
> > >> RE: separate JIRA for MPack/Ansible. No objection to tracking them
> > >> separately, but for this item to be complete, you'll need both the
> > feature
> > >> and the ability to install it.
> > >>
> > >> -D...
> > >>
> > >>
> > >> On Tue, Jan 24, 2017 at 5:33 PM, Ryan Merriman <merrimanr@gmail.com>
> > >> wrote:
> > >>
> > >>> Assuming we're going to write all errors to a single error topic, I
> > > think
> > >>> it makes sense to agree on an error message schema and handle errors
> > >> across
> > >>> the 3 different topologies in the same way with a single
> > implementation.
> > >>> The implementation in ParserBolt (ErrorUtils.handleError) produces
> the
> > >> most
> > >>> verbose error object so I think it's a good candidate for the single
> > >>> implementation.  Here is the message structure it currently produces:
> > >>>
> > >>> {
> > >>>   "exception": "java.lang.Exception: there was an error",
> > >>>   "hostname": "host",
> > >>>   "stack": "java.lang.Exception: ...",
> > >>>   "time": 1485295416563,
> > >>>   "message": "there was an error",
> > >>>   "rawMessage": "raw message",
> > >>>   "rawMessage_bytes": [],
> > >>>   "source.type": "bro_error"
> > >>> }
> > >>>
> > >>> From our discussion so far we need to add a couple fields:  an error
> > > type
> > >>> and hash id.  Adding these to the message looks like:
> > >>>
> > >>> {
> > >>>   "exception": "java.lang.Exception: there was an error",
> > >>>   "hostname": "host",
> > >>>   "stack": "java.lang.Exception: ...",
> > >>>   "time": 1485295416563,
> > >>>   "message": "there was an error",
> > >>>   "rawMessage": "raw message",
> > >>>   "rawMessage_bytes": [],
> > >>>   "source.type": "bro_error",
> > >>>   "error.type": "parser_error",
> > >>>   "rawMessage_hash": "dde41b9920954f94066daf6291fb58a9"
> > >>> }
> > >>>
> > >>> We should also consider expanding the error types I listed earlier.
> > >>> Instead of just having "indexing_error" we could have
> > >>> "elasticsearch_indexing_error", "hdfs_indexing_error" and so on.
> > >>>
> > >>> Jon, if an exception happens in an enrichment or threat intel bolt
> the
> > >>> message is passed along with no error thrown (only logged).
> Everywhere
> > >>> else I'm having trouble identifying specific fields that should be
> > >> hashed.
> > >>> Would hashing the message in every case be acceptable?  Do you know
> of
> > a
> > >>> place where we could hash a field instead?  On the topic of
> exceptions
> > > in
> > >>> enrichments, are we ok with an error only being logged and not added
> to
> > >> the
> > >>> message or emitted to the error queue?
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jan 24, 2017 at 3:10 PM, Ryan Merriman <merrimanr@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> That use case makes sense to me.  I don't think it will require
that
> > >> much
> > >>>> additional effort either.
> > >>>>
> > >>>> On Tue, Jan 24, 2017 at 1:02 PM, Zeolla@GMail.com <zeolla@gmail.com
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Regarding error vs validation - Either way I'm not very
> concerned.  I
> > >>>>> initially assumed they would be combined and agree with that
> > > approach,
> > >>> but
> > >>>>> splitting them out isn't a very big deal to me either.
> > >>>>>
> > >>>>> Re: Ryan.  Yes, exactly.  In the case of a parser issue (or
> anywhere
> > >>> else
> > >>>>> where it's not possible to pick out the exact thing causing
the
> > > issue)
> > >>> it
> > >>>>> would be a hash of the complete message.
> > >>>>>
> > >>>>> Regarding the architecture, I mostly agree with James except
that I
> > >>> think
> > >>>>> step 3 needs to also be able to somehow group errors via the
> original
> > >>>>> data (identify
> > >>>>> replays, identify repeat issues with data in a specific field,
> issues
> > >>> with
> > >>>>> consistently different data, etc.).  This is essentially the
first
> > >> step
> > >>> of
> > >>>>> troubleshooting, which I assume you are doing if you're looking
at
> > > the
> > >>>>> error dashboard.
> > >>>>>
> > >>>>> If the hash gets moved out of the initial implementation, I'm
> fairly
> > >>>>> certain you lose this ability.  The point here isn't to handle
long
> > >>> fields
> > >>>>> (although that's a benefit of this approach), it's to attach
a
> unique
> > >>>>> identifier to the error/validation issue message that links
it to
> the
> > >>>>> original problem.  I'd be happy to consider alternative solutions
> to
> > >>> this
> > >>>>> problem (for instance, actually sending across the data itself)
I
> > > just
> > >>>>> haven't been able to think of another way to do this that I
like
> > >> better.
> > >>>>> Jon
> > >>>>>
> > >>>>> On Tue, Jan 24, 2017 at 1:13 PM Ryan Merriman <merrimanr@gmail.com
> >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> We also need a JIRA for any install/Ansible/MPack work
needed.
> > >>>>>>
> > >>>>>> On Tue, Jan 24, 2017 at 12:06 PM, James Sirota <
> jsirota@apache.org>
> > >>>>> wrote:
> > >>>>>>> Now that I had some time to think about it I would
collapse all
> > >>> error
> > >>>>> and
> > >>>>>>> validation topics into one.  We can differentiate between
> > >> different
> > >>>>> views
> > >>>>>>> of the data (split by error source etc) via Kibana
dashboards.  I
> > >>>>> would
> > >>>>>>> implement this feature incrementally.  First I would
modify all
> > >> the
> > >>>>> bolts
> > >>>>>>> to log to a single topic.  Second, I would get the
error indexing
> > >>>>> done by
> > >>>>>>> attaching the indexing topology to the error topic.
Third I would
> > >>>>> create
> > >>>>>>> the necessary dashboards to view errors and validation
failures
> > > by
> > >>>>>> source.
> > >>>>>>> Lastly, I would file a follow-on JIRA to introduce
hashing of
> > >> errors
> > >>>>> or
> > >>>>>>> fields that are too long.  It seems like a separate
feature that
> > >> we
> > >>>>> need
> > >>>>>> to
> > >>>>>>> think through.  We may need a stellar function around
that.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> James
> > >>>>>>>
> > >>>>>>> 24.01.2017, 10:25, "Ryan Merriman" <merrimanr@gmail.com>:
> > >>>>>>>> I understand what Jon is talking about. He's proposing
we hash
> > >> the
> > >>>>>> value
> > >>>>>>>> that caused the error, not necessarily the error
message
> > > itself.
> > >>>>> For an
> > >>>>>>>> enrichment this is easy. Just pass along the field
value that
> > >>> failed
> > >>>>>>>> enrichment. For other cases the field that caused
the error may
> > >>> not
> > >>>>> be
> > >>>>>> so
> > >>>>>>>> obvious. Take parser validation for example. The
message is
> > >>>>> validated
> > >>>>>> as
> > >>>>>>>> a whole and it may not be easy to determine which
field is the
> > >>>>> cause.
> > >>>>>> In
> > >>>>>>>> that case would a hash of the whole message work?
> > >>>>>>>>
> > >>>>>>>> There is a broader architectural discussion that
needs to
> > > happen
> > >>>>> before
> > >>>>>>> we
> > >>>>>>>> can implement this. Currently we have an indexing
topology that
> > >>>>> reads
> > >>>>>>> from
> > >>>>>>>> 1 topic and writes messages to ES but errors are
written to
> > >>> several
> > >>>>>>>> different topics:
> > >>>>>>>>
> > >>>>>>>>    - parser_error
> > >>>>>>>>    - parser_invalid
> > >>>>>>>>    - enrichments_error
> > >>>>>>>>    - threatintel_error
> > >>>>>>>>    - indexing_error
> > >>>>>>>>
> > >>>>>>>> I can see 4 possible approaches to implementing
this:
> > >>>>>>>>
> > >>>>>>>>    1. Create an index topology for each error topic
> > >>>>>>>>       1. Good because we can easily reuse the indexing
topology
> > >>> and
> > >>>>>> would
> > >>>>>>>>       require the least development effort
> > >>>>>>>>       2. Bad because it would consume a lot of
extra worker
> > >> slots
> > >>>>>>>>    2. Move the topic name into the error JSON message
as a new
> > >>>>>>> "error_type"
> > >>>>>>>>    field and write all messages to the indexing
topic
> > >>>>>>>>       1. Good because we don't need to create a
new topology
> > >>>>>>>>       2. Bad because we would be flowing data and
errors
> > > through
> > >>> the
> > >>>>>> same
> > >>>>>>>>       topology. A spike in errors could affect
message
> > > indexing.
> > >>>>>>>>    3. Compromise between 1 and 2. Create another
indexing
> > >> topology
> > >>>>> that
> > >>>>>>> is
> > >>>>>>>>    dedicated to indexing errors. Move the topic
name into the
> > >>> error
> > >>>>>> JSON
> > >>>>>>>>    message as a new "error_type" field and write
all errors to
> > > a
> > >>>>> single
> > >>>>>>> error
> > >>>>>>>>    topic.
> > >>>>>>>>    4. Write a completely new topology with multiple
spouts (1
> > >> for
> > >>>>> each
> > >>>>>>>>    error type listed above) that all feed into
a single
> > >>>>>>> BulkMessageWriterBolt.
> > >>>>>>>>       1. Good because the current topologies would
not need to
> > >>>>> change
> > >>>>>>>>       2. Bad because it would require the most
development
> > >> effort,
> > >>>>>> would
> > >>>>>>>>       not reuse existing topologies and takes up
more worker
> > >> slots
> > >>>>>> than 3
> > >>>>>>>> Are there other approaches I haven't thought of?
I think 1 and
> > > 2
> > >>> are
> > >>>>>> off
> > >>>>>>>> the table because they are shortcuts and not good
long-term
> > >>>>> solutions.
> > >>>>>> 3
> > >>>>>>>> would be my choice because it introduces less complexity
than
> > > 4.
> > >>>>>>> Thoughts?
> > >>>>>>>> Ryan
> > >>>>>>>>
> > >>>>>>>> On Mon, Jan 23, 2017 at 5:44 PM, Zeolla@GMail.com
<
> > >>> zeolla@gmail.com
> > >>>>>>> wrote:
> > >>>>>>>>>  In that case the hash would be of the value
in the IP field,
> > >>> such
> > >>>>> as
> > >>>>>>>>>  sha3(8.8.8.8).
> > >>>>>>>>>
> > >>>>>>>>>  Jon
> > >>>>>>>>>
> > >>>>>>>>>  On Mon, Jan 23, 2017, 6:41 PM James Sirota
<
> > >> jsirota@apache.org>
> > >>>>>> wrote:
> > >>>>>>>>>  > Jon,
> > >>>>>>>>>  >
> > >>>>>>>>>  > I am still not entirely following why
we would want to use
> > >>>>> hashing.
> > >>>>>>> For
> > >>>>>>>>>  > example if my error is "Your IP field
is invalid and failed
> > >>>>>>> validation"
> > >>>>>>>>>  > hashing this error string will always
result in the same
> > >> hash.
> > >>>>> Why
> > >>>>>>> not
> > >>>>>>>>>  > just use the actual error string? Can
you provide an
> > > example
> > >>>>> where
> > >>>>>>> you
> > >>>>>>>>>  > would use it?
> > >>>>>>>>>  >
> > >>>>>>>>>  > Thanks,
> > >>>>>>>>>  > James
> > >>>>>>>>>  >
> > >>>>>>>>>  > 23.01.2017, 16:29, "Zeolla@GMail.com"
<zeolla@gmail.com>:
> > >>>>>>>>>  > > For 1 - I'm good with that.
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > I'm talking about hashing the relevant
content itself not
> > >>> the
> > >>>>>>> error.
> > >>>>>>>>>  Some
> > >>>>>>>>>  > > benefits are (1) minimize load on
search index (there's
> > >>>>> minimal
> > >>>>>>> benefit
> > >>>>>>>>>  > in
> > >>>>>>>>>  > > spending the CPU and disk to keep
it at full fidelity
> > >>>>> (tokenize
> > >>>>>> and
> > >>>>>>>>>  > store))
> > >>>>>>>>>  > > (2) provide something to key on
for dashboards (assuming
> > > a
> > >>>>> good
> > >>>>>>> hash
> > >>>>>>>>>  > > algorithm that avoids collisions
and is second preimage
> > >>>>>> resistant)
> > >>>>>>> and
> > >>>>>>>>>  > (3)
> > >>>>>>>>>  > > specific to errors, if the issue
is that it failed to
> > >>> index, a
> > >>>>>> hash
> > >>>>>>>>>  gives
> > >>>>>>>>>  > > us some protection that the issue
will not occur twice.
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > Jon
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > On Mon, Jan 23, 2017, 2:47 PM James
Sirota <
> > >>>>> jsirota@apache.org>
> > >>>>>>> wrote:
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > Jon,
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > With regards to 1, collapsing to
a single dashboard for
> > >> each
> > >>>>>> would
> > >>>>>>> be
> > >>>>>>>>>  > > fine. So we would have one error
index and one "failed to
> > >>>>>> validate"
> > >>>>>>>>>  > > index. The distinction is that errors
would be things
> > > that
> > >>>>> went
> > >>>>>>> wrong
> > >>>>>>>>>  > > during stream processing (failed
to parse, etc...), while
> > >>>>>>> validation
> > >>>>>>>>>  > > failures are messages that explicitly
failed stellar
> > >>>>>>> validation/schema
> > >>>>>>>>>  > > enforcement. There should be relatively
few of the second
> > >>>>> type.
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > With respect to 3, why do you want
the error hashed? Why
> > >> not
> > >>>>> just
> > >>>>>>>>>  search
> > >>>>>>>>>  > > for the error text?
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > Thanks,
> > >>>>>>>>>  > > James
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > 20.01.2017, 14:01, "Zeolla@GMail.com"
<zeolla@gmail.com>:
> > >>>>>>>>>  > >> As someone who currently fills
the platform engineer
> > >> role,
> > >>> I
> > >>>>> can
> > >>>>>>> give
> > >>>>>>>>>  > this
> > >>>>>>>>>  > >> idea a huge +1. My thoughts:
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> 1. I think it depends on exactly
what data is pushed
> > > into
> > >>> the
> > >>>>>>> index
> > >>>>>>>>>  > (#3).
> > >>>>>>>>>  > >> However, assuming the errors
you proposed recording, I
> > >>> can't
> > >>>>> see
> > >>>>>>> huge
> > >>>>>>>>>  > >> benefits to having more than
one dashboard. I would be
> > >>> happy
> > >>>>> to
> > >>>>>> be
> > >>>>>>>>>  > >> persuaded otherwise.
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> 2. I would say yes, storing
the errors in HDFS in
> > >> addition
> > >>> to
> > >>>>>>>>>  indexing
> > >>>>>>>>>  > is
> > >>>>>>>>>  > >> a good thing. Using METRON-510
> > >>>>>>>>>  > >> <https://issues.apache.org/jira/browse/METRON-510>
as a
> > >>> case
> > >>>>>>> study,
> > >>>>>>>>>  > there
> > >>>>>>>>>  > >> is the potential in this environment
for
> > >>> attacker-controlled
> > >>>>>> data
> > >>>>>>> to
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > result
> > >>>>>>>>>  > >> in processing errors which could
be a method of evading
> > >>>>> security
> > >>>>>>>>>  > >> monitoring. Once an attack is
identified, the long term
> > >>> HDFS
> > >>>>>>> storage
> > >>>>>>>>>  > would
> > >>>>>>>>>  > >> allow better historical analysis
for
> > >>> low-and-slow/persistent
> > >>>>>>> attacks
> > >>>>>>>>>  > (I'm
> > >>>>>>>>>  > >> thinking of a method of data
exfil that also won't
> > >>>>> successfully
> > >>>>>>> get
> > >>>>>>>>>  > stored
> > >>>>>>>>>  > >> in Lucene, but is hard to identify
over a short period
> > > of
> > >>>>> time).
> > >>>>>>>>>  > >> - Along this line, I think that
there are various parts
> > >> of
> > >>>>>> Metron
> > >>>>>>>>>  > (this
> > >>>>>>>>>  > >> included) which could benefit
from having method of
> > >>>>> configuring
> > >>>>>>> data
> > >>>>>>>>>  > aging
> > >>>>>>>>>  > >> by bucket in HDFS (Following
Nick's comments here
> > >>>>>>>>>  > >> <https://issues.apache.org/jira/browse/METRON-477>).
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> 3. I would potentially add a
hash of the content that
> > >>> failed
> > >>>>>>>>>  > validation to
> > >>>>>>>>>  > >> help identify repeats over time
with less of a concern
> > >> that
> > >>>>>> you'd
> > >>>>>>>>>  have
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > back
> > >>>>>>>>>  > >> to back failures (i.e. instead
of storing the value
> > >>> itself).
> > >>>>>>>>>  > Additionally,
> > >>>>>>>>>  > >> I think it's helpful to be able
to search all times
> > > there
> > >>>>> was an
> > >>>>>>>>>  > indexing
> > >>>>>>>>>  > >> error (instead of it hitting
the catch-all).
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> Jon
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> On Fri, Jan 20, 2017 at 1:17
PM James Sirota <
> > >>>>>> jsirota@apache.org>
> > >>>>>>>>>  > wrote:
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> We already have a capability
to capture bolt errors and
> > >>>>>> validation
> > >>>>>>>>>  > errors
> > >>>>>>>>>  > >> and pipe them into a Kafka topic.
I want to propose that
> > >> we
> > >>>>>>> attach a
> > >>>>>>>>>  > >> writer topology to the error
and validation failed kafka
> > >>>>> topics
> > >>>>>> so
> > >>>>>>>>>  > that we
> > >>>>>>>>>  > >> can (a) create a new ES index
for these errors and (b)
> > >>>>> create a
> > >>>>>>> new
> > >>>>>>>>>  > Kibana
> > >>>>>>>>>  > >> dashboard to visualize them.
The benefit would be that
> > >>> errors
> > >>>>>> and
> > >>>>>>>>>  > >> validation failures would be
easier to see and analyze.
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> I am seeking feedback on the
following:
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> - How granular would we want
this feature to be? Think
> > > we
> > >>>>> would
> > >>>>>>> want
> > >>>>>>>>>  > one
> > >>>>>>>>>  > >> index/dashboard per source?
Or would it be better to
> > >>> collapse
> > >>>>>>>>>  > everything
> > >>>>>>>>>  > >> into the same index?
> > >>>>>>>>>  > >> - Do we care about storing these
errors in HDFS as well?
> > >> Or
> > >>>>> is
> > >>>>>>>>>  indexing
> > >>>>>>>>>  > >> them enough?
> > >>>>>>>>>  > >> - What types of errors should
we record? I am proposing:
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> For error reporting:
> > >>>>>>>>>  > >> --Message failed to parse
> > >>>>>>>>>  > >> --Enrichment failed to enrich
> > >>>>>>>>>  > >> --Threat intel feed failures
> > >>>>>>>>>  > >> --Generic catch-all for all
other errors
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> For validation reporting:
> > >>>>>>>>>  > >> --What part of message failed
validation
> > >>>>>>>>>  > >> --What stellar validator caused
the failure
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> -------------------
> > >>>>>>>>>  > >> Thank you,
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> James Sirota
> > >>>>>>>>>  > >> PPMC- Apache Metron (Incubating)
> > >>>>>>>>>  > >> jsirota AT apache DOT org
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> --
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> Jon
> > >>>>>>>>>  > >>
> > >>>>>>>>>  > >> Sent from my mobile device
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > -------------------
> > >>>>>>>>>  > > Thank you,
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > James Sirota
> > >>>>>>>>>  > > PPMC- Apache Metron (Incubating)
> > >>>>>>>>>  > > jsirota AT apache DOT org
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > --
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > Jon
> > >>>>>>>>>  > >
> > >>>>>>>>>  > > Sent from my mobile device
> > >>>>>>>>>  >
> > >>>>>>>>>  > -------------------
> > >>>>>>>>>  > Thank you,
> > >>>>>>>>>  >
> > >>>>>>>>>  > James Sirota
> > >>>>>>>>>  > PPMC- Apache Metron (Incubating)
> > >>>>>>>>>  > jsirota AT apache DOT org
> > >>>>>>>>>  >
> > >>>>>>>>>  --
> > >>>>>>>>>
> > >>>>>>>>>  Jon
> > >>>>>>>>>
> > >>>>>>>>>  Sent from my mobile device
> > >>>>>>> -------------------
> > >>>>>>> Thank you,
> > >>>>>>>
> > >>>>>>> James Sirota
> > >>>>>>> PPMC- Apache Metron (Incubating)
> > >>>>>>> jsirota AT apache DOT org
> > >>>>>>>
> > >>>>> --
> > >>>>>
> > >>>>> Jon
> > >>>>>
> > >>>>> Sent from my mobile device
> > >>>>>
> > >>>>
> > >> --
> > >>
> > >> Jon
> > >>
> > >> Sent from my mobile device
> > >>
> >
> > --
>
> Jon
>
> Sent from my mobile device
>
> --

Jon

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message