metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otto Fowler <ottobackwa...@gmail.com>
Subject Re: [DISCUSS] Metron Parsers in Nifi
Date Thu, 09 Aug 2018 15:42:30 GMT
I would say that

- For each configuration parameter we want to pull in, it should be
explicitly configured through a property as well as through a controller
service that accesses the metron zk
- Transformations should not be conflated with parsing in those processors
or readers

There is no on the fly configuration change in nifi ( You can’t change
properties once started ).

Wouldn’t the simplest minimal start be to say that we expect either nifi or
metron and simplify things?  Let nifi nifi, let metron metron.


On August 9, 2018 at 10:53:24, Justin Leet (justinjleet@gmail.com) wrote:

That's definitely good info, thanks for reaching out to them about it.

In terms of exposing/sharing, I don't think we have to couple them tightly
(in fact, I think we should loosen the coupling as much as possible without
forcing reimplementation of things). I think there's definitely a way to do
that terms of the general purpose processor I proposed (or in terms of
RecordReader or another implementation).

It would definitely be easy enough to configure it to either pull from ZK
or to use a parser config json extract as a parameter (to maintain the same
formatting and make migration easy).  And we can still build specific
NiFi-oriented parsers as needed (that manage things like Schema via the
registry and other Nifi mechanisms).  This keeps parsers entirely decoupled
from a metron installation.

Alternatively, we extract our config handling to a module and scripts we
can package up and easily deploy configs against ZK (or the maybe Nifi's
StateController's or whatever).  We definitely shouldn't need absolutely
everything installed to be able to run just parsers on Nifi.

Having said that, right now the easiest way we have to maintain on the fly
updatable configs (and updatable is important!) is via ZK.  Params in Nifi
aren't quite that flexible, to the best of my knowledge (i.e. you have to
stop, update config and restart). We might be able to exploit the
StateController to manage this for us, but I'm honestly not familiar enough
with it and for deployments split between NiFi and Storm, it means
configuration gets managed in a couple different ways (which may with users
since there is a fairly brightline delineation which makes it easier to
accept).  There some complicated configs like fieldTransforms, which is
part of why I would like things to be configured in the same format (if not
the same mechanism).

Ideally, in my mind, the parsers shared between both NiFi and Storm just
implement the very general MessageParser interface (which is pretty
minimal, a couple setup methods, validation, and the actual parse).  This
is pretty lightweight and the split of metron-parsers into
metron-parsers-common et al. would loosen the coupling between parsers and
the rest of metron into that core needed to support that.

IMO, at that point, we'd have a pretty minimal NAR (or NARs depending on
config management) that lets us run our set of parsers, lets users build
new parsers (and don't block specialized NiFi implementations that exploit
NiFi's feature set), and lets us get things configured in a relatively
consistent manner, without losing features, and hopefully requiring a
pretty minimal slice of Metron to be useful.

On Thu, Aug 9, 2018 at 10:06 AM Otto Fowler <ottobackwards@gmail.com> wrote:

> I think the benefits are clear.  What is unclear is if the goal is to
> expose or share or re-use Metron capabilities ( stellar, parsing ) in nifi
> in a way that is native to nifi ( configured and managed in nifi ), where
> you may not even need metron ( say you just want to parse asa ) or if the
> goal is to have a hybrid approach coupling the processors/readers to the
> metron installation.
>
>
> On August 9, 2018 at 09:14:58, Justin Leet (justinjleet@gmail.com) wrote:
>
> I'll add onto Mike's discussion with the original set of requirements I had
> in mind (and apply feedback on these as necessary!). This is largely
> overlap with what Mike said, but I want to make sure it's clear where my
> proposal was coming from, so we can improve on it as needed. James and
> Mike are also right, I think I skipped over the benefits of NiFi in general
> a bit, so thanks for chiming in there.
>
> - Deploy our bundled parsers without needing custom wrapping on all of
> them.
> - Don't prevent ourselves from building custom wrapping as needed.
> - Custom Java parsers with an easy way to hook in, similar to what we
> already do in Storm.
> - One stop (or at least one format) configuration, for the case when we're
> doing some thing in NiFi (parsers) and some elsewhere (enrichment and
> indexing). I don't think it'll always be "start in NiFi, end in Storm",
> especially as we build out Stellar capability, but I also don't want users
> learning a different set of configs and config tools for every platform we
> run on.
> - Ability to build out parsers and other systems fairly easily, e.g. Spark.
> - Support our current use cases (in particular parser chaining as a more
> advanced use case).
>
> It really boils down to providing a relatively simple user path to be able
> to migrate to NiFi as needed or desired as simply as possible in a very
> general way, while not preventing parser by parser enhancements.
>
> On Wed, Aug 8, 2018 at 7:14 PM Michael Miklavcic <
> michael.miklavcic@gmail.com> wrote:
>
> > I think it also provides customers greater control over their
> architecture
> > by giving them the flexibility to choose where/how to host their parsers.
> >
> > To Justin's point about the API, my biggest concern about the
> RecordReader
> > approach is that it is not stable. We already have a similar problem in
> > having the TransportClient in ElasticSearch - they are prone to changing
> it
> > in minor versions with the advent of their newer REST API, which is
> > problematic for ensuring a stable installation.
> >
> > From my own perspective, our goal with NiFi, at least in part, should be
> > the ability to deploy our core parsing infrastructure, i.e.
> >
> > - pre-built parsers
> > - custom java parsers
> > - Stellar transforms
> > - custom stellar transforms
> >
> > And have the ability to configure it similarly to how we configure
> parsers
> > within Storm. Consistent with our recent parser chaining and aggregation
> > feature, users should be able to construct and deploy similar constructs
> in
> > NiFi. The core architectural shift would be that parser code should be
> > platform agnostic. We provide the plumbing in Storm, NiFi, and <Spark
> > Streaming?, other> and platform architects and devops teams can choose
> how
> > and where to deploy.
> >
> > Best,
> > Mike
> >
> >
> > On Wed, Aug 8, 2018 at 9:57 AM James Sirota <jsirota@apache.org> wrote:
> >
> > > Integration with NiFi would be useful for parsing low-volume
> telemetries
> > > at the edge. This is a much more resource friendly way to do it than
> > > setting up dedicated storm topologies. The integration would be that
> the
> > > NiFi processor parses the data and pushes it straight into the
> enrichment
> > > topic, saving us the resources of having multiple parsers in storm
> > >
> > > Thanks,
> > > James
> > >
> > > 07.08.2018, 11:29, "Otto Fowler" <ottobackwards@gmail.com>:
> > > > Why do we start over. We are going back and forth on implementation,
> > and
> > > I
> > > > don’t think we have the same goals or concerns.
> > > >
> > > > What would be the requirements or goals of metron integration with
> > Nifi?
> > > > How many levels or options for integration do we have?
> > > > What are the approaches to choose from?
> > > > Who are the target users?
> > > >
> > > > On August 7, 2018 at 12:24:56, Justin Leet (justinjleet@gmail.com)
> > > wrote:
> > > >
> > > > So how does the MetronRecordReader roll into everything? It seems
> like
> > > it'd
> > > > be more useful on the reader per format approach, but otherwise it
> > > doesn't
> > > > really seem like we gain much, and it requires getting everything
> > linked
> > > up
> > > > properly to be used. Assuming we looked at doing it that way, is the
> > idea
> > > > that we'd setup a ControllerService with the MetronRecordReader and a
> > > > MetronRecordWriter and then have the StellarTransformRecord processor
> > > > configured with those ControllerServices? How do we manage the
> > > > configurations of the everything that way? How does the
> > ControllerService
> > > > get configured with whatever parser(s) are needed in the flow?
> > Basically,
> > > > what's your vision for how everything would tie together?
> > > >
> > > > I also forgot to mention this in the original writeup, but there's
> > > another
> > > > reason to avoid the RecordReader: It's not considered stable. See
> > > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/RecordReader.java#L34
> > > .
> > > > That alone makes me super hesitant to use it, if it can shift out
> from
> > > > under us in even in incremental version.
> > > >
> > > > I'm also unclear on why StellarTransformRecord processor matters for
> > > either
> > > > approach. With the Processor approach you could simply follow it up
> > with
> > > > the Stellar processor, the same way you'd would in the RecordReader
> > > > approach. The Stellar processor should be a parallel improvement,
> not a
> > > > conflicting one.
> > > >
> > > > On Tue, Aug 7, 2018 at 11:50 AM Otto Fowler <ottobackwards@gmail.com
> >
> > > wrote:
> > > >
> > > >> A Metron Processor itself isn’t really necessary. A
> > MetronRecordReader
> > > (
> > > >> either the megalithic or a reader per format ) would be a good
> > > approach.
> > > >> Then have StellarTransformRecord processor that can do Stellar on
> > _any_
> > > >> record, regardless of source.
> > > >>
> > > >> On August 7, 2018 at 11:06:22, Justin Leet (justinjleet@gmail.com)
> > > wrote:
> > > >>
> > > >> Thanks for the comments, Otto, this is definitely great feedback.
> I'd
> > > >> love to respond inline, but the email's already starting to lose
> it's
> > > >> formatting, so I'll go with the classic "wall of text". Let me know
> > if
> > > I
> > > >> didn't address everything.
> > > >>
> > > >> Loading modules (or jars or whatever) outside of our Processor gives
> > us
> > > >> the benefit of making it incredibly easy for a users to create their
> > > own
> > > >> parsers. I would definitely expect our own bundled parsers to be
> > > included
> > > >> in our base NAR, but loading modules enables users to only have to
> > > learn
> > > >> how Metron wants our stuff lined up and just plug it in. Having said
> > > that,
> > > >> I could see having a wrapper for our bundled parsers that makes it
> > > really
> > > >> easy to just say you want an MetronAsaParser or MetronBroParser,
> etc.
> > > That
> > > >> would give us the best of both worlds, where it's easy to get setup
> > our
> > > >> bundled parsers and also trivial to pull in non-bundled parsers.
> What
> > > >> doing this gives us is an easy way to support (hopefully) every
> > parser
> > > that
> > > >> gets made, right out of the box, without us needing to build a
> > > specialized
> > > >> version of everything until we decide to and without users having
to
> > > jump
> > > >> through hoops.
> > > >>
> > > >> None of this prevents anyone from creating specialized parsers (for
> > > perf
> > > >> reasons, or to use the schema registries, or anything else). It's
> > > probably
> > > >> worthwhile to package up some of built-in parsers and customize them
> > > to use
> > > >> more specialized feature appropriately as we see things get used in
> > the
> > > >> wild. Like you said, we could likely provide Avro schemas for some
> of
> > > this
> > > >> and give users a more robust experience on what we choose to support
> > > and
> > > >> provide guidance for other things. I'm also worried that building
> > > >> specialized schemas becomes problematic for things like parser
> > chaining
> > > >> (where our routers wrap the underlying messages and add on their own
> > > info).
> > > >> Going down that road potentially requires anything wrapped to have
a
> > > >> specialized schema for the wrapped version in addition to a vanilla
> > > version
> > > >> (although please correct me if I'm missing something there, I'll
> > openly
> > > >> admit to some shakiness on how that would be handled).
> > > >>
> > > >> I also disagree that this is un-Nifi-like, although I'm admittedly
> > not
> > > as
> > > >> skilled there. The basis for doing this is directly inspired by the
> > > >> JoltTransformer, which is extremely similar to the proposed setup
> for
> > > our
> > > >> parsers: Simply take a spec (in this case the configs, including the
> > > >> fieldTransformations), and delegate a mapping from bytes[] to JSON.
> > The
> > > >> Jolt library even has an Expression Language (check out
> > > >>
> > >
> >
> https://community.hortonworks.com/articles/105965/expression-language-with-jolt-in-apache-nifi.html
> > > ),
> > > >> so it's not a foreign concept. I believe Simon Ball has already done
> > > some
> > > >> experimenting around with getting Stellar running in NiFi, and I'd
> > > love to
> > > >> see Stellar more readily available in NiFi in general.
> > > >>
> > > >> Re: the ControllerService, I see this as a way to maintain Metron's
> > > use of
> > > >> ZK as the source of config truth. Users could definitely be using
> > NiFi
> > > and
> > > >> Storm in tandem (parse in NiFi + enrich and index from Storm, for
> > > >> example). Using the ControllerService gives us a ZK instance as the
> > > single
> > > >> source of truth. That way we aren't forcing users to go to two
> > > different
> > > >> places to manage configs. This also lets us leverage our existing
> > > scripts
> > > >> and our existing infrastructure around configs and their management
> > and
> > > >> validation very easily. It also gives users a way to port from NiFi
> > to
> > > >> Storm or vice-versa without having to migrate configs as well. We
> > could
> > > >> also provide the option to configure the Processor itself with the
> > data
> > > >> (just don't set up a controller service and provide the json or
> > > whatever as
> > > >> one of our properties).
> > > >>
> > > >> On Tue, Aug 7, 2018 at 10:12 AM Otto Fowler <
> ottobackwards@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >>> I think this is a good idea. As I mentioned in the other thread
> I’ve
> > > >>> been doing a lot of work on Nifi recently.
> > > >>> I think the important thing is that what is done should be done
the
> > > NiFi
> > > >>> way, not bolting the Metron composition
> > > >>> onto Nifi. Think of it like the Tao of Unix, the parsers and
> > > components
> > > >>> should be single purpose and simple, allowing
> > > >>> exceptional flexibility in composition.
> > > >>>
> > > >>> Comments inline.
> > > >>>
> > > >>> On August 7, 2018 at 09:27:01, Justin Leet (justinjleet@gmail.com)
> > > wrote:
> > > >>>
> > > >>> Hi all,
> > > >>>
> > > >>> There's interest in being able to run Metron parsers in NiFi,
> rather
> > > than
> > > >>>
> > > >>> inside Storm. I dug into this a bit, and have some thoughts on
how
> > we
> > > >>> could
> > > >>> go about this. I'd love feedback on this, along with anything
we'd
> > > >>> consider must haves as well as future enhancements.
> > > >>>
> > > >>> 1. Separate metron-parsers into metron-parsers-common and
> > metron-storm
> > > >>> and create metron-parsers-nifi. For this code to be reusable across
> > > >>> platforms (NiFi, Storm, and anything else in the future), we'll
> need
> > > to
> > > >>> decouple our parsers and Storm.
> > > >>>
> > > >>> +1. The “parsing code” should be a library that implements
an
> > > interface
> > > >>> ( another library ).
> > > >>>
> > > >>> The Processors and the Storm things can share them.
> > > >>>
> > > >>> - There's also some nice fringe benefits around refactoring our
> code
> > > >>> to be substantially more clear and understandable; something
> > > >>> which came up
> > > >>> while allowing for parser aggregation.
> > > >>> 2. Create a MetronProcessor that can run our parsers.
> > > >>> - I took a look at how RecordReader could be leveraged (e.g.
> > > >>> CSVRecordReader), but this is pretty tightly tied into schemas
> > > >>> and is meant
> > > >>> to be used by ControllerServices, which are then used by
> Processors.
> > > >>> There's friction involved there in terms of schemas, but also
in
> > > terms of
> > > >>>
> > > >>> access to ZK configs and things like parser chaining. We might
> > > >>> be able to
> > > >>> leverage it, but it seems like it'd be fairly shoehorned in
> > > >>> without getting
> > > >>> the schema and other benefits.
> > > >>>
> > > >>> We won’t have to provide our ‘no schema processors’ ( grok,
csv,
> > json
> > > ).
> > > >>>
> > > >>> All the remaining processors DO have schemas that we know about.
We
> > > can
> > > >>> just provide the avro schemas the same way we provide the ES
> > schemas.
> > > >>>
> > > >>> The “parsing” should not be conflated with the transform/stellar
in
> > > >>> NiFi. We should make that separate. Running Stellar over Records
> > > would be
> > > >>> the best thing.
> > > >>>
> > > >>> - This Processor would work similarly to Storm: bytes[] in ->
JSON
> > > >>> out.
> > > >>> - There is a Processor
> > > >>> <
> > > >>>
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
> > > >>> >
> > > >>> that
> > > >>> handles loading other JARs that we can model a
> > > >>> MetronParserProcessor off of
> > > >>> that handles classpath/classloader issues (basically just sets
up a
> > > >>> classloader specific to what's being loaded and swaps out the
> > Thread's
> > > >>> loader when it calls to outside resources).
> > > >>>
> > > >>> There should be no reason to load modules outside the NAR. Why
do
> > you
> > > >>> expect to? If each Metron Processor equiv of a Metron Storm Parser
> > is
> > > just
> > > >>> parsing to json it shouldn’t need much.And we could package
them in
> > > the
> > > >>> NAR. I would suggest we have a Processor per Parser to allow for
> > > >>> specialization. It should all be in the nar.
> > > >>>
> > > >>> The Stellar Processor, if you would support the works would
> possibly
> > > need
> > > >>> this.
> > > >>>
> > > >>> 3. Create a MetronZkControllerService to supply our configs to
our
> > > >>> processors.
> > > >>> - This is a pretty established NiFi pattern for being able to
> > provide
> > > >>> access to other services needed by a Processor (e.g. databases
or
> > > large
> > > >>> configurations files).
> > > >>> - The same controller service can be used by all Processors to
> > manage
> > > >>> configs in a consistent manner.
> > > >>>
> > > >>> I think controller services would make sense where needed, I’m
just
> > > not
> > > >>> sure what you imagine them being needed for?
> > > >>>
> > > >>> If the user has NiFi, and a Registry etc, are you saying you
> imagine
> > > them
> > > >>> using Metron + ZK to manage configurations? Or to be using BOTH
> > storm
> > > >>> processors and Nifi Processors?
> > > >>>
> > > >>> At that point, we can just NAR our controller service and parser
> > > processor
> > > >>>
> > > >>> up as needed, deploy them to NiFi, and let the user provide a
> config
> > > for
> > > >>> where their custom parsers can be provided (i.e. their parser
jar).
> > > This
> > > >>> would be 3 nars (processor, controller-service, and
> > > controller-service-api
> > > >>>
> > > >>> in order to bind the other two together).
> > > >>>
> > > >>> Once deployed, our ability to use parsers should fit well into
the
> > > >>> standard
> > > >>> NiFi workflow:
> > > >>>
> > > >>> 1. Create a MetronZkControllerService.
> > > >>> 2. Configure the service to point at zookeeper.
> > > >>> 3. Create a MetronParser.
> > > >>> 4. Configure it to use the controller service + parser jar location
> > +
> > > >>> any other needed configs.
> > > >>> 5. Use the outputs as needed downstream (either writing out to
> Kafka
> > > or
> > > >>> feeding into more MetronParsers, etc.)
> > > >>>
> > > >>> Chaining parsers should ideally become a matter of chaining
> > > MetronParsers
> > > >>>
> > > >>> (and making sure the enveloping configs carry through properly).
> For
> > > >>> parser
> > > >>> aggregation, I'd just avoid it entirely until we know it's needed
> in
> > > NiFi.
> > > >>>
> > > >>> Justin
> > >
> > > -------------------
> > > Thank you,
> > >
> > > James Sirota
> > > PMC- Apache Metron
> > > jsirota AT apache DOT org
> > >
> > >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message