ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Mashenkov <andrey.mashen...@gmail.com>
Subject Re: IEP-54: Schema-first approach for 3.0
Date Mon, 23 Nov 2020 12:38:52 GMT
Hi Igniters,

I'd like to continue discussion of IEP-54 (Schema-first approach).

Hope everyone who is interested had a chance to get familiar with the
proposal [1].
Please, do not hesitate to ask questions and share your ideas.

I've prepared a prototype of serializer [2] for the data layout described
in the proposal.
In prototy, I compared 2 approaches to (de)serialize objects, the first one
uses java reflection/unsafe API and similar to one we already use in Ignite
and the second one generates serializer for particular user class and uses
Janino library for compilation.
Second one shows better results in benchmarks.
I think we can go with it as default serializer and have reflection-based
implementation as a fallback if someone will have issues with the first one.
WDYT?

There are a number of tasks under the umbrella ticket [3] waiting for the
assignee.

BTW, I'm going to create more tickets for schema manager modes
implementation, but would like to clarify some details.

I thought schemaManager on each node should held:
  1. Local mapping of "schema version" <--> validated local key/value
classes pair.
  2. Cluster-wide schema changes history.
On the client side. Before any key-value API operation we should validate a
schema for a given key-value pair.
If there is no local-mapping exists for a given key-value pair or if a
cluster wide schema has a more recent version then the key-value pair
should be validated against the latest version and local mapping should be
updated/actualized.
If an object doesn't fit to the latest schema then it depends on schema
mode: either fail the operation ('strict' mode) or a new mapping should be
created and a new schema version should be propagated to the cluster.

On the server side we usually have no key-value classes and we operate with
tuples.
As schema change history is available and a tuple has schema version, then
it is possible to upgrade any received tuple to the last version without
desialization.
Thus we could allow nodes to send key-value pairs of previous versions (if
they didn't receive a schema update yet) without reverting schema changes
made by a node with newer classes.

Alex, Val, Ivan did you mean the same?


[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-54%3A+Schema-first+Approach
[2] https://github.com/apache/ignite/tree/ignite-13618/modules/commons
[3] https://issues.apache.org/jira/browse/IGNITE-13616

On Thu, Sep 17, 2020 at 9:21 AM Ivan Pavlukhin <vololo100@gmail.com> wrote:

> Folks,
>
> Please do not ignore history. We had a thread [1] with many bright
> ideas. We can resume it.
>
> [1]
> http://apache-ignite-developers.2346864.n4.nabble.com/Applicability-of-term-cache-to-Apache-Ignite-td36541.html
>
> 2020-09-10 0:08 GMT+03:00, Denis Magda <dmagda@apache.org>:
> > Val, makes sense, thanks for explaining.
> >
> > Agree that we need to have a separate discussion thread for the "table"
> and
> > "cache" terms substitution. I'll appreciate it if you start the thread
> > sharing pointers to any relevant IEPs and reasoning behind the suggested
> > change.
> >
> > -
> > Denis
> >
> >
> > On Tue, Sep 8, 2020 at 6:01 PM Valentin Kulichenko <
> > valentin.kulichenko@gmail.com> wrote:
> >
> >> Hi Denis,
> >>
> >> I guess the wording in the IEP is a little bit confusing. All it means
> is
> >> that you should not create nested POJOs, but rather inline fields into a
> >> single POJO that is mapped to a particular schema. In other words,
> nested
> >> POJOs are not supported.
> >>
> >> Alex, is this correct? Please let me know if I'm missing something.
> >>
> >> As for the "cache" term, I agree that it is outdated, but I'm not sure
> >> what we can replace it with. "Table" is tightly associated with SQL, but
> >> SQL is optional in our case. Do you want to create a separate discussion
> >> about this?
> >>
> >> -Val
> >>
> >> On Tue, Sep 8, 2020 at 4:37 PM Denis Magda <dmagda@apache.org> wrote:
> >>
> >>> Val,
> >>>
> >>> I've checked the IEP again and have a few questions.
> >>>
> >>> Arbitrary nested objects and collections are not allowed as column
> >>> values.
> >>> > Nested POJOs should either be inlined into schema, or stored as BLOBs
> >>>
> >>>
> >>> Could you provide a DDL code snippet showing how the inlining of POJOs
> >>> is
> >>> supposed to work?
> >>>
> >>> Also, we keep using the terms "cache" and "table" throughout the IEP.
> Is
> >>> it
> >>> the right time to discuss an alternate name that would replace those
> >>> too?
> >>> Personally, the "table" should stay and the "cache" should go
> >>> considering
> >>> that SQL is one of the primary APIs in Ignite and that DDL is supported
> >>> out-of-the-box.
> >>>
> >>>
> >>> -
> >>> Denis
> >>>
> >>>
> >>> On Mon, Sep 7, 2020 at 12:26 PM Valentin Kulichenko <
> >>> valentin.kulichenko@gmail.com> wrote:
> >>>
> >>> > Ivan,
> >>> >
> >>> > I see your point. I agree that with the automatic updates we step
> into
> >>> the
> >>> > schema-last territory.
> >>> >
> >>> > Actually, if we support automatic evolution, we can as well support
> >>> > creating a cache without schema and inferring it from the first
> >>> > insert.
> >>> In
> >>> > other words, we can have both "schema-first" and "schema-last" modes.
> >>> >
> >>> > Alexey, what do you think?
> >>> >
> >>> > -Val
> >>> >
> >>> > On Mon, Sep 7, 2020 at 5:59 AM Alexey Goncharuk <
> >>> > alexey.goncharuk@gmail.com>
> >>> > wrote:
> >>> >
> >>> > > Ivan,
> >>> > >
> >>> > > Thank you, I got your concern now. As it is mostly regarding the
> >>> > > terminology, I am absolutely fine with changing the name to
> whatever
> >>> fits
> >>> > > the approach best. Dynamic or evolving schema sounds great. I
will
> >>> make
> >>> > > corresponding changes to the IEP once we settle on the name.
> >>> > >
> >>> > > пн, 7 сент. 2020 г. в 11:33, Ivan Pavlukhin <vololo100@gmail.com>:
> >>> > >
> >>> > > > Hi Val,
> >>> > > >
> >>> > > > Thank you for your answer!
> >>> > > >
> >>> > > > My understanding is a little bit different. Yes, schema evolution
> >>> > > > definitely should be possible. But I see a main difference
in
> "how
> >>> > > > schema is updated". I treat a common SQL approach schema-first.
> >>> Schema
> >>> > > > and data manipulation operations are clearly separated and
it
> >>> enables
> >>> > > > interesting capabilities, e.g. preventing untended schema
changes
> >>> > > > by
> >>> > > > mistaken data operations, restricting user permissions to
change
> >>> > > > schema.
> >>> > > >
> >>> > > > > Schema-first means that schema exists in advance and
all the
> >>> stored
> >>> > > data
> >>> > > > is compliant with it - that's exactly what is proposed.
> >>> > > >
> >>> > > > A schema-last approach mentioned in [1] also assumes that
schema
> >>> > > > exists, but it is inferred from data. Is not it more similar
to
> >>> > > > the
> >>> > > > proposing approach?
> >>> > > >
> >>> > > > And I would like to say, that my main concern so far is mostly
> >>> > > > about
> >>> > > > terminology. And I suppose if it confuses me then others
might be
> >>> > > > confused as well. My feeling is closer to "dynamic or liquid
or
> >>> > > > may
> >>> be
> >>> > > > evolving schema".
> >>> > > >
> >>> > > > [1]
> >>> > > >
> >>> >
> https://people.cs.umass.edu/~yanlei/courses/CS691LL-f06/papers/SH05.pdf
> >>> > > >
> >>> > > > 2020-09-07 0:47 GMT+03:00, Valentin Kulichenko <
> >>> > > > valentin.kulichenko@gmail.com>:
> >>> > > > > Hi Ivan,
> >>> > > > >
> >>> > > > > I don't see an issue with that. Schema-first means that
schema
> >>> exists
> >>> > > in
> >>> > > > > advance and all the stored data is compliant with it
- that's
> >>> exactly
> >>> > > > what
> >>> > > > > is proposed. There are no restrictions prohibiting changes
to
> >>> > > > > the
> >>> > > schema.
> >>> > > > >
> >>> > > > > -Val
> >>> > > > >
> >>> > > > > On Sat, Sep 5, 2020 at 9:52 PM Ivan Pavlukhin <
> >>> vololo100@gmail.com>
> >>> > > > wrote:
> >>> > > > >
> >>> > > > >> Alexey,
> >>> > > > >>
> >>> > > > >> I am a little bit confused with terminology. My
understanding
> >>> > conforms
> >>> > > > >> to a survey [1] (see part X Semi Structured Data).
Can we
> >>> > > > >> really
> >>> > treat
> >>> > > > >> a "dynamic schema" approach as a kind of "schema-first"?
> >>> > > > >>
> >>> > > > >> [1]
> >>> > > > >>
> >>> > >
> >>>
> https://people.cs.umass.edu/~yanlei/courses/CS691LL-f06/papers/SH05.pdf
> >>> > > > >>
> >>> > > > >> 2020-09-02 1:53 GMT+03:00, Denis Magda <dmagda@apache.org>:
> >>> > > > >> >>
> >>> > > > >> >> However, could you please elaborate on
the relation between
> >>> > Ignite
> >>> > > > and
> >>> > > > >> >> ORM?
> >>> > > > >> >> Is there a use case for Hibernate running
on top of Ignite
> >>> > > > >> >> (I
> >>> > > haven't
> >>> > > > >> >> seen
> >>> > > > >> >> one so far)? If so, what is missing exactly
on the Ignite
> >>> side to
> >>> > > > >> support
> >>> > > > >> >> this? In my understanding, all you need
is SQL API which we
> >>> > already
> >>> > > > >> have.
> >>> > > > >> >> Am I missing something?
> >>> > > > >> >
> >>> > > > >> >
> >>> > > > >> > Good point, yes, if all the ORM integrations
use Ignite SQL
> >>> APIs
> >>> > > > >> > internally, then they can easily translate
an Entity object
> >>> into
> >>> > an
> >>> > > > >> > INSERT/UPDATE statement that lists all the
object's fields.
> >>> > Luckily,
> >>> > > > >> > our
> >>> > > > >> > Spring Data integration is already based on
the Ignite SQL
> >>> > > > >> > APIs
> >>> > and
> >>> > > > >> > needs
> >>> > > > >> > to be improved once the schema-first approach
is supported.
> >>> That
> >>> > > would
> >>> > > > >> > solve a ton of usability issues.
> >>> > > > >> >
> >>> > > > >> > I would revise the Hibernate integration as
well during the
> >>> Ignite
> >>> > > 3.0
> >>> > > > >> dev
> >>> > > > >> > phase. Can't say if it's used a lot but Spring
Data is
> >>> > > > >> > getting
> >>> > > > traction
> >>> > > > >> for
> >>> > > > >> > sure.
> >>> > > > >> >
> >>> > > > >> > @Michael Pollind, I'll loop you in as long
as you've started
> >>> > working
> >>> > > > on
> >>> > > > >> the
> >>> > > > >> > Ignite support for Micornaut Data
> >>> > > > >> > <
> >>> > https://micronaut-projects.github.io/micronaut-data/latest/guide/>
> >>> > > > and
> >>> > > > >> > came across some challenges. Just watch this
discussion.
> >>> > > > >> > That's
> >>> > what
> >>> > > > is
> >>> > > > >> > coming in Ignite 3.0.
> >>> > > > >> >
> >>> > > > >> >
> >>> > > > >> > -
> >>> > > > >> > Denis
> >>> > > > >> >
> >>> > > > >> >
> >>> > > > >> > On Mon, Aug 31, 2020 at 5:11 PM Valentin Kulichenko
<
> >>> > > > >> > valentin.kulichenko@gmail.com> wrote:
> >>> > > > >> >
> >>> > > > >> >> Hi Denis,
> >>> > > > >> >>
> >>> > > > >> >> Generally speaking, I believe that the
schema-first
> approach
> >>> > > natively
> >>> > > > >> >> addresses the issue if duplicate fields
in key and value
> >>> objects,
> >>> > > > >> because
> >>> > > > >> >> schema will be created for a cache, not
for an object, as
> it
> >>> > > happens
> >>> > > > >> now.
> >>> > > > >> >> Basically, the schema will define whether
there is a
> primary
> >>> key
> >>> > or
> >>> > > > >> >> not,
> >>> > > > >> >> and which fields are included in case there
is one. Any API
> >>> that
> >>> > we
> >>> > > > >> would
> >>> > > > >> >> have must be compliant with this, so it
becomes fairly easy
> >>> > > > >> >> to
> >>> > work
> >>> > > > >> >> with
> >>> > > > >> >> data as with a set of records, rather than
key-value pairs.
> >>> > > > >> >>
> >>> > > > >> >> However, could you please elaborate on
the relation between
> >>> > Ignite
> >>> > > > and
> >>> > > > >> >> ORM?
> >>> > > > >> >> Is there a use case for Hibernate running
on top of Ignite
> >>> > > > >> >> (I
> >>> > > haven't
> >>> > > > >> >> seen
> >>> > > > >> >> one so far)? If so, what is missing exactly
on the Ignite
> >>> side to
> >>> > > > >> support
> >>> > > > >> >> this? In my understanding, all you need
is SQL API which we
> >>> > already
> >>> > > > >> have.
> >>> > > > >> >> Am I missing something?
> >>> > > > >> >>
> >>> > > > >> >> -Val
> >>> > > > >> >>
> >>> > > > >> >> On Mon, Aug 31, 2020 at 2:08 PM Denis Magda
<
> >>> dmagda@apache.org>
> >>> > > > wrote:
> >>> > > > >> >>
> >>> > > > >> >> > Val,
> >>> > > > >> >> >
> >>> > > > >> >> > I would propose adding another point
to the motivations
> >>> > > > >> >> > list
> >>> > > which
> >>> > > > >> >> > is
> >>> > > > >> >> > related to the ORM frameworks such
as Spring Data,
> >>> Hibernate,
> >>> > > > >> Micronaut
> >>> > > > >> >> and
> >>> > > > >> >> > many others.
> >>> > > > >> >> >
> >>> > > > >> >> > Presently, the storage engine requires
to distinguish key
> >>> > objects
> >>> > > > >> >> > from
> >>> > > > >> >> the
> >>> > > > >> >> > value ones that complicate the usage
of Ignite with those
> >>> ORM
> >>> > > > >> >> > frameworks
> >>> > > > >> >> > (especially if a key object comprises
several fields).
> >>> > > > >> >> > More
> >>> on
> >>> > > this
> >>> > > > >> can
> >>> > > > >> >> be
> >>> > > > >> >> > found here:
> >>> > > > >> >> >
> >>> > > > >> >> >
> >>> > > > >> >>
> >>> > > > >>
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Key-and-Value-fields-with-same-name-and-SQL-DML-td47557.html
> >>> > > > >> >> >
> >>> > > > >> >> > It will be nice if the new schema-first
approach allows
> us
> >>> to
> >>> > > work
> >>> > > > >> with
> >>> > > > >> >> > a
> >>> > > > >> >> > single entity object when it comes
to the ORMs. With no
> >>> need to
> >>> > > > >> >> > split
> >>> > > > >> >> > the
> >>> > > > >> >> > entity into a key and value. Just
want to be sure that
> the
> >>> > Ignite
> >>> > > > >> >> > 3.0
> >>> > > > >> >> > has
> >>> > > > >> >> > all the essential public APIs that
would support the
> >>> > > single-entity
> >>> > > > >> >> > based
> >>> > > > >> >> > approach.
> >>> > > > >> >> >
> >>> > > > >> >> > What do you think?
> >>> > > > >> >> >
> >>> > > > >> >> > -
> >>> > > > >> >> > Denis
> >>> > > > >> >> >
> >>> > > > >> >> >
> >>> > > > >> >> > On Fri, Aug 28, 2020 at 3:50 PM Valentin
Kulichenko <
> >>> > > > >> >> > valentin.kulichenko@gmail.com>
wrote:
> >>> > > > >> >> >
> >>> > > > >> >> > > Igniters,
> >>> > > > >> >> > >
> >>> > > > >> >> > > One of the big changes proposed
for Ignite 3.0 is the
> >>> > so-called
> >>> > > > >> >> > > "schema-first approach". To add
more clarity, I've
> >>> > > > >> >> > > started
> >>> > > > writing
> >>> > > > >> >> > > the
> >>> > > > >> >> > IEP
> >>> > > > >> >> > > for this change:
> >>> > > > >> >> > >
> >>> > > > >> >> > >
> >>> > > > >> >> >
> >>> > > > >> >>
> >>> > > > >>
> >>> > > >
> >>> > >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-54%3A+Schema-first+Approach
> >>> > > > >> >> > >
> >>> > > > >> >> > > Please take a look and let me
know if there are any
> >>> immediate
> >>> > > > >> >> > > thoughts,
> >>> > > > >> >> > > suggestions, or objections.
> >>> > > > >> >> > >
> >>> > > > >> >> > > -Val
> >>> > > > >> >> > >
> >>> > > > >> >> >
> >>> > > > >> >>
> >>> > > > >> >
> >>> > > > >>
> >>> > > > >>
> >>> > > > >> --
> >>> > > > >>
> >>> > > > >> Best regards,
> >>> > > > >> Ivan Pavlukhin
> >>> > > > >>
> >>> > > > >
> >>> > > >
> >>> > > >
> >>> > > > --
> >>> > > >
> >>> > > > Best regards,
> >>> > > > Ivan Pavlukhin
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >
>
>
> --
>
> Best regards,
> Ivan Pavlukhin
>


-- 
Best regards,
Andrey V. Mashenkov

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message