metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Sirota <jsir...@apache.org>
Subject Re: [DISCUSS] Management of Elastic and other index schemas
Date Sat, 18 Feb 2017 17:26:53 GMT
I am not sure I agree with packaging source-specific templates with the parser.  I think that
would make it harder to add additional storage sources.  For example, what happens if I have
50 parsers with Solr and ES schemas defined, but now I want to add druid?  Now I have to add
50 schemas to all my  existing parsers, which I don't think makes sense.  I think what we
should have instead is tuple mappers that map some internal representation of our schema to
whatever schema the tool uses.  We already somewhat started to move down this path with Kyle
defining the schema enum for his ASA parser PR and Simon defining a JSON schema for his CEF
parser PR.  I think we need to unify these approaches and then propagate them to all the parsers.
 I think what has to happen is the following:

We have to introduce a partial schema for Metron messages where you can enforce a schema on
a part of a message you want, but at the same time allow enough flexibility for the rest of
the message to be flexible.  What I mean by that is that you should enforce a schema for things
like ip, protocol, timestamp, etc, but have a fully flexible structure outside of that.

After you do that then you can map the partial schema you defined to es, solr, druid, etc,
etc.  For the fields you don't have a schema for you just assume they are strings.  To add
additional storage/indexing source to Metron all you do is define a mapper to that source's
schema and load that into our indexing bolt.



Thanks,
James

17.02.2017, 16:36, "Zeolla@GMail.com" <zeolla@gmail.com>:
> I think this is a good direction to move things toward - moving indexing
> templates to be packaged with parsers (using multiple tiered options) that
> are then merged with the possible enrich fields before getting added to the
> indexing technology in use. Now, to read the proposal thread...
>
> Jon
>
> On Fri, Feb 17, 2017, 4:25 PM Simon Elliston Ball <
> simon@simonellistonball.com> wrote:
>
>>  I’d broadly agree with that tiered approach.
>>
>>  The version where the parser emits a generic schema, and enrichments
>>  contribute generic schema chunks to that which get combined into an indexer
>>  specific template generated at the end of the flow, so yes, pretty much
>>  inline with your proposal. (I did read though it, apologies if I missed any
>>  of the detail, brain is still a little bit post-RSA!)
>>
>>  Simon
>>
>>  > On 17 Feb 2017, at 12:38, Otto Fowler <ottobackwards@gmail.com> wrote:
>>  >
>>  > We already make them do this now, or they get the defaults. So this is
>>  no different.
>>  > Having parsers emit names and types etc, that would be another step - or
>>  it could be the ‘generic schema’ as implemented actually.
>>  >
>>  > A tiered approach - from
>>  > * you give nothing with the parser - you get whatever ES guesses at but
>>  you don’t care do you
>>  > * you give the schema
>>  > * you give the types and we figure it out for you
>>  >
>>  > would be the best to move to.
>>  >
>>  > Also, we could use the names and types method tied to enrichment to
>>  generate indexing templates for enrichment types or deriving them rather,
>>  which i mention in my proposal.
>>  >
>>  > I’m starting to think you haven’t rushed out to read it Simon ;)
>>  >
>>  >
>>  >
>>  > On February 17, 2017 at 15:24:37, Simon Elliston Ball (
>>  simon@simonellistonball.com <mailto:simon@simonellistonball.com>) wrote:
>>  >
>>  >> I like that, to an extent… Forcing the provision of explicit schema
>>  might be a bit of a load for parser development. I’m assuming that custom
>>  parsers would be pushed towards the same packaging approach.
>>  >>
>>  >> Would it make sense to require the parser to emit field names and types
>>  expected, and then for us to provide a means of creating the templates for
>>  supported indices, and push the actual template management to the index
>>  layer rather than the parsing layer. Schema is after all determined not
>>  just by a parser, but also by the combination of enrichments and models
>>  applied.
>>  >>
>>  >> We could also of course provide an override option within your proposed
>>  parser package model to allow any destination specific configuration of the
>>  indexing template.
>>  >>
>>  >> Simon
>>  >>
>>  >> > On 17 Feb 2017, at 12:01, Otto Fowler <ottobackwards@gmail.com
>>  <mailto:ottobackwards@gmail.com>> wrote:
>>  >> >
>>  >> > I think we can get there from my proposal.
>>  >> > A source may package:
>>  >> > * explicit schemas ( ES, SOLR, FOO )
>>  >> > * a generic to be invented schema for a to be invented pluggable
>>  indexing
>>  >> > component :)
>>  >> > and we’ll be able to handle it.
>>  >> >
>>  >> >
>>  >> >
>>  >> > On February 17, 2017 at 14:39:07, Kyle Richardson (
>>  kylerichardson2@gmail.com <mailto:kylerichardson2@gmail.com>)
>>  >> > wrote:
>>  >> >
>>  >> > I personally like the idea of a typed schema per parser that we could
>>  >> > translate to multiple targets. This would allow us a lot more
>>  modularity
>>  >> > and extensibility in indexing down the road.
>>  >> >
>>  >> > -Kyle
>>  >> >
>>  >> > On Fri, Feb 17, 2017 at 1:59 PM, Simon Elliston Ball <
>>  >> > simon@simonellistonball.com <mailto:simon@simonellistonball.com>>
>>  wrote:
>>  >> >
>>  >> >> That sounds like a great idea Otto. Do you have any early design
on
>>  that
>>  >> >> we can look at. Also, rather than just elastic templates do you
>>  think we
>>  >> >> should have some sort of typed schema we could translate to multiple
>>  >> >> targets (solr, elastic, ur... other...) or are you thinking of
>>  packaging
>>  >> >> specific scheme assets like template json with the parser?
>>  >> >>
>>  >> >> Simon
>>  >> >>
>>  >> >>> On 17 Feb 2017, at 18:42, Otto Fowler <ottobackwards@gmail.com
>>  <mailto:ottobackwards@gmail.com>> wrote:
>>  >> >>>
>>  >> >>>
>>  >> >>> Not to jump the gun, but I’m crafting a proposal about
parsers and
>>  one
>>  >> >> of the things I am going to propose relates to having the ES
>>  Template for
>>  >> > a
>>  >> >> given parser installed or packaged with the parser. We could
load the
>>  >> >> template from there, edit, save and deploy etc. We can extend
that
>>  >> > concept
>>  >> >> more and more later (drafts, versioning etc )
>>  >> >>>
>>  >> >>>
>>  >> >>>> On February 17, 2017 at 13:22:45, Simon Elliston Ball
(
>>  >> >> simon@simonellistonball.com <mailto:simon@simonellistonball.com>)
>>  wrote:
>>  >> >>>>
>>  >> >>>> A little while ago the issue of managing Elastic templates
for new
>>  >> >> sensor configs came up, and we didn’t quite put it to bed.
>>  >> >>>>
>>  >> >>>> When creating new sensors, I almost invariably find the
>>  auto-generated
>>  >> >> schemas for elastic pick some incorrect types. I also find I
have to
>>  >> >> recreate indexes every time to push in the proper dynamic templates
>>  for
>>  >> >> things like geo enrichment fields.
>>  >> >>>>
>>  >> >>>> So, my questions are:
>>  >> >>>> How should we address elastic template for new sensors?
>>  >> >>>> Do we have circumstances where we would need to configure
types, or
>>  >> > can
>>  >> >> we get away with inferring them?
>>  >> >>>> Should we just add some additional dynamic templates
to cover our
>>  >> >> common fields like timestamp (the most common culprit I find
for
>>  >> > incorrect
>>  >> >> typing)?
>>  >> >>>>
>>  >> >>>> I’d also like to think about ways we can generalise
this. Does
>>  anyone
>>  >> >> have any thoughts on what sort of additional index schemes we
should
>>  want
>>  >> >> to infer (solr seems an obvious one, any others?).
>>  >> >>>>
>>  >> >>>> Thoughts on a well typed, schemaed and easily indexed
postcard
>>  please
>>  >> > :)
>>  >> >>>>
>>  >> >>>> Simon
>>  >> >>
>>
>>  --
>
> Jon
>
> Sent from my mobile device

------------------- 
Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Mime
View raw message