metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Nazemian <alinazem...@gmail.com>
Subject Re: [DISCUSS] Handling dropped messages in REGEX_SELECT with Kafka topic routing
Date Mon, 07 Jan 2019 12:51:35 GMT
Just one thing to bear in mind, publishing an error may cause some
operational challenges as it fills up the error topic as well as storm logs
which may not be necessary. To wear a Metron user hat, dropping a message
with a debug/trace level log to specify the event is filter out makes
sense. I guess if we want to make this really fancy having the flexibility
to decide what happens next would be really nice to have as No. 2 and 3
would be required in some special cases  (Make it a bit complex, though).
Of course, the default can be the drop with the ack.

Cheers,
Ali

On Thu, Dec 20, 2018 at 8:18 AM Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> Completely agreed on the acking. The reason I posed the question to begin
> with was because, while I believe dropping+acking is the correct
> functionality, I could see a few alternative patterns for handling this:
>
>    1. Require filtering to be handled by the message filter infrastructure
>    and publish an error to the error queue if field transformations such as
>    REGEX_SELECT violate this by dropping messages.
>    2. Default records to be written to enrichments, or handle per my
>    comments in #1
>    3. Default records to be written to the topic defined by outputTopic
>    (non-default version of #2)
>
> At any rate, we should fix the acking problem and then the dropped messages
> pattern makes sense to me. I've created a Jira to track it -
> https://issues.apache.org/jira/browse/METRON-1948.
>
> On Wed, Dec 19, 2018 at 12:43 PM Casey Stella <cestella@gmail.com> wrote:
>
> > We absolutely should be acking the dropped messages otherwise they'll be
> in
> > a replay loop.  Not acking is a flat-out bug IMO.
> >
> > On Wed, Dec 19, 2018 at 2:37 PM Michael Miklavcic <
> > michael.miklavcic@gmail.com> wrote:
> >
> > > When a message is filtered by the message filtering mechanism, we
> > > explicitly drop the message (and presumably ack it in Storm), as
> > explained
> > > here -
> > >
> > >
> >
> https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#filtered
> > > .
> > > When using the REGEX_SELECT field transformation (see here -
> > >
> > >
> >
> https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#fieldtransformation-configuration
> > > )
> > > with the kafka.topicField option for parser-chaining, it's unclear to
> me
> > > whether we expect the same behavior (drop message, ack it). The
> > > interpretation I get from this example in the parser-chaining doc
> > >
> > >
> >
> https://github.com/apache/metron/tree/master/use-cases/parser_chaining#the-pix_syslog_router-parser
> > > suggests to me that the approach we take for messages with message
> > > filtering is the correct one, however in testing an example with
> dropped
> > > messages, we appear not to ack those dropped messages.
> > >
> > > Before I go creating a fix I thought it best to summarize and confirm
> my
> > > expectations on this functionality. Messages from a REGEX_SELECT that
> > don't
> > > match a pattern, and therefore don't get a value assigned to their
> output
> > > topic value, should be dropped and acked.
> > >
> > > *Example:*
> > > {
> > > "parserClassName": "org.apache.metron.parsers.GrokParser",
> > >         "sensorTopic": "myInTopic",
> > > ...
> > >         "parserConfig": {
> > > ...,
> > > "kafka.topicField": "output_topic"
> > > },
> > > "fieldTransformations": [
> > > {
> > > "input": [
> > > "message"
> > > ],
> > > "output": [
> > > "output_topic"
> > > ],
> > > "transformation": "REGEX_SELECT",
> > > "config": {
> > > "world": "^Hello "
> > > }
> > > },
> > > ...
> > > }
> > >
> > > *Input Records:*
> > > "...sshd[32469]: Hello..."
> > > "...sshd[30432]: Bye..."
> > >
> > > *Output:*
> > > Kafka topic = "world" (as determined by the REGEX_SELECT pattern match
> > that
> > > sets the "output_topic" property used by kafka.topicField)
> > > 1 record present
> > > contents of that record = our record with "Hello" in it
> > > 1 record is dropped ("Bye" record) and will not be forwarded any
> further
> > > through the pipeline.
> > >
> >
>


-- 
A.Nazemian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message