metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Stella <ceste...@gmail.com>
Subject Re: [DISCUSS] Field conversions
Date Tue, 05 Jun 2018 13:58:17 GMT
To be clear, I'm not even suggesting that we create any tooling here.  I'd
say just a reference to the ES docs and a call-out in Upgrading.md would
suffice as long as we have some strong reason to believe it'll work.  As
far as I'm concerned, the sooner we're out of the business of transforming
fields, the better.

On Tue, Jun 5, 2018 at 9:49 AM Justin Leet <justinjleet@gmail.com> wrote:

> ES does have some docs around how this gets handled in upgrades:
>
> https://www.elastic.co/guide/en/elasticsearch/reference/2.4/dots-in-names.html
>
> Might be worth taking a look to see what conflicts we'd have going from 2.x
> to 5.x and figuring out where to go from there.
>
> On Tue, Jun 5, 2018 at 9:46 AM, Simon Elliston Ball <
> simon@simonellistonball.com> wrote:
>
> > I guess in principal you could use
> > https://www.elastic.co/guide/en/elasticsearch/reference/
> > current/docs-reindex.html#docs-reindex-change-name
> > to reindex with the new fields. It wouldn't be hard to script up a bit of
> > python to help users out with that, or of course to leave that as an
> > exercise to the reader. It would be nice to have a script that read and
> > transformed fields for templates and indices to replace the colons with
> > dots in ES.
> >
> > Simon
> >
> > On 5 June 2018 at 06:40, Casey Stella <cestella@gmail.com> wrote:
> >
> > > +1 to that, Simon.  Do we have a sense of if there are utilities
> provided
> > > by ES to do this kind of migration transformation easily?
> > >
> > > On Tue, Jun 5, 2018 at 9:37 AM Simon Elliston Ball <
> > > simon@simonellistonball.com> wrote:
> > >
> > > > I would definitely agree that the transformation should be removed.
> We
> > > have
> > > > now however added a complex generic solution in the backend, which is
> > > going
> > > > to be noop for most people. This was done I believe for the sake of
> > > > backward compatibility. I would argue however, that there is no need
> to
> > > > support ES 2.3, and therefore no need to support de-dotting
> > > > transformations. This does seem somewhat over-engineered to me,
> though
> > it
> > > > does save people re-indexing on upgrades. I suspect in reality that
> > this
> > > is
> > > > a rare edge case, and that we would do far better to settle on one
> > > solution
> > > > (the dotted version, not the colons, to my mind)
> > > >
> > > > Simon
> > > >
> > > > On 5 June 2018 at 06:29, Ryan Merriman <merrimanr@gmail.com> wrote:
> > > >
> > > > > I agree completely.  I will leave this thread open for a day or two
> > to
> > > > give
> > > > > others a chance to weigh in.  If no one opposes, I will creates
> Jiras
> > > for
> > > > > removing field transformations and transforming existing data.
> > > > >
> > > > > On Tue, Jun 5, 2018 at 8:21 AM, Casey Stella <cestella@gmail.com>
> > > wrote:
> > > > >
> > > > > > Well, on write it is a transformation, on read it's a
> translation.
> > > > This
> > > > > is
> > > > > > to say that you're providing a mapping on read to translate
field
> > > names
> > > > > > given the index you're using.  The other approach that I was
> > > > considering
> > > > > > last night is a field transformation REST call which translates
> > field
> > > > > names
> > > > > > that the UI could call.  So, the UI would pass 'source.type'
to
> the
> > > > field
> > > > > > translation service and in Solr it'd return source.type and
in ES
> > > it'd
> > > > > > return source:type.  Underneath the hood the service would use
> the
> > > same
> > > > > > transformation as the writer uses.  That's another way to skin
> this
> > > > cat.
> > > > > >
> > > > > > Ultimately, I think we should just ditch this field
> transformation
> > > > > > business, as Laurens said, as long as we have a utility to
> > transform
> > > > > > existing data.
> > > > > >
> > > > > > On Tue, Jun 5, 2018 at 8:54 AM Ryan Merriman <
> merrimanr@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Having 2 different patterns for configuring field name
> > > > transformations
> > > > > on
> > > > > > > read vs write is confusing to me.  I agree with both of
you
> that
> > > > > > > normalizing on '.' and not having to do the translation
at all
> > > would
> > > > be
> > > > > > > ideal.  Like you both suggested, we would need some utility
or
> > > script
> > > > > to
> > > > > > > convert preexisting data to match this format.  There could
> also
> > be
> > > > > some
> > > > > > > adjustments a user would need to make in the UI but I feel
like
> > we
> > > > > could
> > > > > > > document around that.  Are there any objections to doing
it
> this
> > > way?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jun 4, 2018 at 4:30 PM, Laurens Vets <
> laurens@daemon.be>
> > > > > wrote:
> > > > > > >
> > > > > > > > ES 2.x support officially ended 4 months ago (
> > > > > > > > https://www.elastic.co/support/eol), so why still
support
> ':'
> > at
> > > > > all?
> > > > > > :)
> > > > > > > > Additionally, 2.x isn't even supported at all on the
last 2
> > > Ubuntu
> > > > > LTS
> > > > > > > > releases (16.04 & 18.05).
> > > > > > > >
> > > > > > > > Therefor, move everything to use '.' and provide a
> > > > conversion/upgrade
> > > > > > > > script to change '.' to ':'?
> > > > > > > >
> > > > > > > >
> > > > > > > > On 2018-06-04 13:55, Ryan Merriman wrote:
> > > > > > > >
> > > > > > > >> We've been dealing with a reoccurring challenge
in Metron.
> It
> > > is
> > > > > > common
> > > > > > > >> for various fields to contain '.' characters for
the purpose
> > of
> > > > > making
> > > > > > > >> them
> > > > > > > >> more readable, namespacing, etc.  At one point
we only
> > supported
> > > > > > > >> Elasticsearch 2.3 which did not allow dots and
forced us to
> > use
> > > > ':'
> > > > > > > >> instead.  This limitation does not exist in later
versions
> of
> > > > > > > >> Elasticsearch
> > > > > > > >> or Solr.
> > > > > > > >>
> > > > > > > >> Now we're in a situation where we need to allow
a user to
> use
> > > > either
> > > > > > one
> > > > > > > >> because they may still be using ES 2.3 or have
data with ':'
> > > > > > characters
> > > > > > > in
> > > > > > > >> field names.  We've attempted to make this configurable
in a
> > > > couple
> > > > > > > >> different PRs:
> > > > > > > >>
> > > > > > > >> https://github.com/apache/metron/pull/1022
> > > > > > > >> https://github.com/apache/metron/pull/1010
> > > > > > > >> https://github.com/apache/metron/pull/1038
> > > > > > > >>
> > > > > > > >> The approaches taken in these are not consistent
and fall
> > short
> > > in
> > > > > > > >> different ways.  The first (METRON-1569 Allow
user to change
> > > field
> > > > > > name
> > > > > > > >> conversion when indexing) only applies to indexing
and not
> > > > querying.
> > > > > > > The
> > > > > > > >> others only apply to a single field which does
not scale
> well.
> > > > Now
> > > > > we
> > > > > > > >> have
> > > > > > > >> an issue with another field in
> > > > > > > >> https://issues.apache.org/jira/browse/METRON-1600.
 Rather
> > than
> > > > > > > >> continuing
> > > > > > > >> with a patchwork of different fixes I want to
attempt to
> > design
> > > a
> > > > > > > >> system-wide solution.
> > > > > > > >>
> > > > > > > >> My first thought is to expand
> > > > > > > https://github.com/apache/metron/pull/1022
> > > > > > > >> to
> > > > > > > >> apply globally.  However this is not trivial and
would
> require
> > > > > > > significant
> > > > > > > >> changes.  It would also make https://github.com/apache/
> > > > > > metron/pull/1010
> > > > > > > >> obsolete and we might end up having to revert
all of it.
> > > > > > > >>
> > > > > > > >> Does anyone have any ideas or opinions?  I am
still
> > researching
> > > > > > > solutions
> > > > > > > >> but would love some guidance from the community.
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --
> > > > simon elliston ball
> > > > @sireb
> > > >
> > >
> >
> >
> >
> > --
> > --
> > simon elliston ball
> > @sireb
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message