metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Miklavcic <michael.miklav...@gmail.com>
Subject Re: [DISCUSS] JsonMapParser original string functionality
Date Fri, 10 May 2019 21:53:07 GMT
I think that's an excellent idea. Can anyone think of a situation where we
wouldn't want to add this the same way for all parsers? I suppose we could
always allow this to be overridden, also.

On Fri, May 10, 2019 at 3:43 PM Nick Allen <nick@nickallen.org> wrote:

> I think maintaining the integrity of the original data makes a lot of sense
> for any parser. And ideally the original string should be what came out of
> Kafka with only the minimally necessary processing.
>
> With that in mind, we could solve this one level up.  Instead of relying on
> each parser to do this right, we could have the ParserRunner and
> specifically the ParserRunnerImpl [1] handle this round-abouts here
> <
> https://github.com/apache/metron/blob/1b6ef88c79d60022542cda7e9abbea7e720773cc/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/ParserRunnerImpl.java#L149-L158
> >
> [1].
> It has the raw message data and can append the original string to each
> message it gets back from the parsers.
>
> Just another approach to consider.
>
> --
> [1]
>
> https://github.com/apache/metron/blob/1b6ef88c79d60022542cda7e9abbea7e720773cc/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/ParserRunnerImpl.java#L149-L158
>
> On Fri, May 10, 2019 at 4:11 PM Otto Fowler <ottobackwards@gmail.com>
> wrote:
>
> > +1
> >
> >
> > On May 10, 2019 at 13:57:55, Michael Miklavcic (
> > michael.miklavcic@gmail.com)
> > wrote:
> >
> > When adding the capability for parsing messages in the JsonMapParser
> using
> > JSON Path expressions the original behavior for managing original strings
> > was changed.
> >
> >
> >
> https://github.com/apache/metron/blob/master/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/json/JSONMapParser.java#L192
> >
> > A couple issues have been reported recently regarding this change:
> >
> > 1. We're losing the actual original string, which is a legal issue for
> > data lineage for some customers
> > 2. Even for the degenerate case with no sub-messages created, the
> > original sub-message string is modified because of the
> > serialization/deserialization process with Jackson/JsonSimple. The fields
> > are reordered bc the content is normalized.
> >
> > I looked at options for preserving formatting, but am unable to find a
> > method that allows you to both parse, then query the original message and
> > then also obtain the raw string matches without the normalizing from
> > ser/deserialization.
> >
> > I'd like to propose that we add a configuration option for this parser
> that
> > allows the user to toggle which approach they'd like to use. My personal
> > preference based on feedback I've gotten from multiple customers is that
> > the default should be the older approach which takes the raw original
> > string. It's arguable that this change in contract is a regression, so
> the
> > default should be the earlier behavior. Any sub-messages would then have
> a
> > copy of that raw original string, not just the sub-message original
> string.
> > Enabling the flag would enable the current sub-message original string
> > functionality.
> >
> > Mike
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message