ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Goncharuk <alexey.goncha...@gmail.com>
Subject Re: IEP-61 Technical discussion
Date Fri, 27 Nov 2020 17:25:43 GMT
Folks, thanks to everyone who joined the call. Summary:

   - We agree that it may be beneficial to separate metastorage and group
   membership services, however, the abstractions should be clean enough so
   that we could implement group membership via metastorage
   - Production cluster setup will involve an administrator 'init' command
   that will initialize the metastorage raft group. Once the metastorage is
   initialized, all nodes may be restarted arbitrarily
   - HA cluster must contain at least 3 nodes. 2-node cluster will stop
   progress when one of the nodes fails (due to metastorage requirements)
   - We will provide a 'developer' cluster mode which will allow a 1-node
   setup and auto-initialization without the 'init' command
   - We are targeting centralized affinity calculation that will be stored
   to the metastorage. Metastorage downtime does not necessarily mean cluster
   availability (subject to the partition replication protocol choice). It
   would be good to maximally hide the partition object so that we could
   support range partitioning in the future

To discuss at the next meeting (do not hesitate to send questions here
before the meeting):

   - Raft implementation details (API model, porting, etc)
   - Transactions interaction with replication protocol
   - Weaker consistency options

Please add more if I forgot something and let's choose a time for the next
meeting.

--AG

чт, 26 нояб. 2020 г. в 16:12, Kseniya Romanova <romanova.ks.spb@gmail.com>:

> Done
>
> чт, 26 нояб. 2020 г. в 13:18, Ivan Daschinsky <ivandasch@gmail.com>:
>
> > Alexey, is it possible to manage call at 16:00 MSK?
> >
> > чт, 26 нояб. 2020 г. в 12:30, Alexey Goncharuk <
> alexey.goncharuk@gmail.com
> > >:
> >
> > > Hi Ivan,
> > >
> > > Unfortunately, the earliest window available for us is 12:00 MSK (1
> hour
> > > slot), or after 14:30 MSK. Let me know what time works best for you.
> > >
> > > ср, 25 нояб. 2020 г. в 21:38, Ivan Daschinsky <ivandasch@gmail.com>:
> > >
> > > > Alexey, I kindly ask you to move the meeting a little bit earlier,
> > ideal
> > > > variant -- in the morning.
> > > >
> > > > ср, 25 нояб. 2020 г. в 20:10, Alexey Goncharuk <
> > > alexey.goncharuk@gmail.com
> > > > >:
> > > >
> > > > > Folks, let's have the call on Friday, Nov 27th at 18:00 MSK? We can
> > use
> > > > the
> > > > > following waiting room link:
> > > > >
> https://zoom.us/j/99450012496?pwd=RWZmOGhCNWlRK0ZpamdOOTZsYTJ0dz09
> > > > >
> > > > > Let me know if this time works for everybody.
> > > > >
> > > > > ср, 25 нояб. 2020 г. в 16:42, Alexey Goncharuk <
> > > > alexey.goncharuk@gmail.com
> > > > > >:
> > > > >
> > > > > > Folks,
> > > > > >
> > > > > > I've made some edits in IEP-61 [1] regarding the group membership
> > > > service
> > > > > > and transaction protocol interaction with the replication
> > > > infrastructure,
> > > > > > please take a look before our Friday call.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-61%3A+Common+Replication+Infrastructure
> > > > > >
> > > > > > пн, 23 нояб. 2020 г. в 13:28, Alexey Goncharuk <
> > > > > alexey.goncharuk@gmail.com
> > > > > > >:
> > > > > >
> > > > > >> Thanks, Ivan,
> > > > > >>
> > > > > >> Another protocol for group membership worth checking out
is
> RAPID
> > > [1]
> > > > (a
> > > > > >> recent one). Not sure though if there are any available
> > > > implementations
> > > > > for
> > > > > >> it already.
> > > > > >>
> > > > > >> [1]
> > > > >
> > https://www.usenix.org/system/files/conference/atc18/atc18-suresh.pdf
> > > > > >>
> > > > > >> пн, 23 нояб. 2020 г. в 10:46, Ivan Daschinsky <
> > ivandasch@gmail.com
> > > >:
> > > > > >>
> > > > > >>> Also, here is some interesting reading about gossip,
SWIM etc.
> > > > > >>>
> > > > > >>> 1 --
> > > > > >>>
> > > >
> http://www.cs.cornell.edu/Info/Projects/Spinglass/public_pdfs/SWIM.pdf
> > > > > >>> 2 --
> > > > > >>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
> http://www.antonkharenko.com/2015/09/swim-distributed-group-membership.html
> > > > > >>> 3 -- https://github.com/hashicorp/memberlist (Foundation
> library
> > > of
> > > > > >>> hashicorp serf)
> > > > > >>> 4 -- https://github.com/scalecube/scalecube-cluster
-- (Java
> > > > > >>> implementation
> > > > > >>> of SWIM)
> > > > > >>>
> > > > > >>> чт, 19 нояб. 2020 г. в 16:35, Ivan Daschinsky
<
> > ivandasch@gmail.com
> > > >:
> > > > > >>>
> > > > > >>> > >> Friday, Nov 27th work for you? If ok,
let's have an open
> > call
> > > > > then.
> > > > > >>> > Yes, great
> > > > > >>> > >> As for the protocol port - we will not
be dealing with the
> > > > > >>> > concurrency...
> > > > > >>> > >>Judging by the Rust port, it seems fairly
straightforward.
> > > > > >>> > Yes, they chose split transport and logic. But
original Go
> > > package
> > > > > from
> > > > > >>> > etcd (see raft/node.go) contains some  heartbeats
mechanism
> > etc.
> > > > > >>> > I agree with you, this seems not to be a huge deal
to port.
> > > > > >>> >
> > > > > >>> > чт, 19 нояб. 2020 г. в 16:13, Alexey Goncharuk
<
> > > > > >>> alexey.goncharuk@gmail.com
> > > > > >>> > >:
> > > > > >>> >
> > > > > >>> >> Ivan,
> > > > > >>> >>
> > > > > >>> >> Agree, let's have a call to discuss the IEP.
I have some
> more
> > > > > thoughts
> > > > > >>> >> regarding how the replication infrastructure
works with
> > > > > >>> >> atomic/transactional caches, will put this
info to the IEP.
> > Does
> > > > > next
> > > > > >>> >> Friday, Nov 27th work for you? If ok, let's
have an open
> call
> > > > then.
> > > > > >>> >>
> > > > > >>> >> As for the protocol port - we will not be dealing
with the
> > > > > concurrency
> > > > > >>> >> model if we choose this way, this is what I
like about their
> > > code
> > > > > >>> >> structure. Essentially, the raft module is
a single-threaded
> > > > > automata
> > > > > >>> >> which
> > > > > >>> >> has a callback to process a message, process
a tick
> (timeout)
> > > and
> > > > > >>> produces
> > > > > >>> >> messages that should be sent and log entries
that should be
> > > > > persisted.
> > > > > >>> >> Judging by the Rust port, it seems fairly straightforward.
> > Will
> > > be
> > > > > >>> happy
> > > > > >>> >> to
> > > > > >>> >> discuss this and other alternatives on the
call as well.
> > > > > >>> >>
> > > > > >>> >> чт, 19 нояб. 2020 г. в 14:41, Ivan
Daschinsky <
> > > > ivandasch@gmail.com
> > > > > >:
> > > > > >>> >>
> > > > > >>> >> > > Any existing library that can be
used to avoid
> > > re-implementing
> > > > > the
> > > > > >>> >> > protocol ourselves? Perhaps, porting the
existing
> > > implementation
> > > > > to
> > > > > >>> Java
> > > > > >>> >> > Personally, I like this idea. Go libraries
(either raft
> > module
> > > > of
> > > > > >>> etcd
> > > > > >>> >> or
> > > > > >>> >> > serf by Hashicorp) are famous for clean
code, good design,
> > > > > >>> stability,
> > > > > >>> >> not
> > > > > >>> >> > enormous size.
> > > > > >>> >> > But, on other side, Go has different model
for concurrency
> > and
> > > > > >>> porting
> > > > > >>> >> > probably will not be so straightforward.
> > > > > >>> >> >
> > > > > >>> >> >
> > > > > >>> >> >
> > > > > >>> >> > чт, 19 нояб. 2020 г. в 13:48,
Ivan Daschinsky <
> > > > > ivandasch@gmail.com
> > > > > >>> >:
> > > > > >>> >> >
> > > > > >>> >> > > I'd suggest to discuss this IEP and
technical details in
> > > open
> > > > > ZOOM
> > > > > >>> >> > > meeting.
> > > > > >>> >> > >
> > > > > >>> >> > > чт, 19 нояб. 2020 г. в 13:47,
Ivan Daschinsky <
> > > > > >>> ivandasch@gmail.com>:
> > > > > >>> >> > >
> > > > > >>> >> > >>
> > > > > >>> >> > >>
> > > > > >>> >> > >> ---------- Forwarded message
---------
> > > > > >>> >> > >> От: Ivan Daschinsky <ivandasch@gmail.com>
> > > > > >>> >> > >> Date: чт, 19 нояб. 2020
г. в 13:02
> > > > > >>> >> > >> Subject: Re: IEP-61 Technical
discussion
> > > > > >>> >> > >> To: Alexey Goncharuk <alexey.goncharuk@gmail.com>
> > > > > >>> >> > >>
> > > > > >>> >> > >>
> > > > > >>> >> > >> Alexey, let's arise another question.
Specifically, how
> > > nodes
> > > > > >>> >> initially
> > > > > >>> >> > >> find each other (discovery) and
how they detect
> failures.
> > > > > >>> >> > >>
> > > > > >>> >> > >> I suppose, that gossip protocol
is an ideal candidate.
> > For
> > > > > >>> example,
> > > > > >>> >> > >> consul [1] uses this approach,
using serf [2] library
> to
> > > > > discover
> > > > > >>> >> > members
> > > > > >>> >> > >> of cluster.
> > > > > >>> >> > >> Then consul forms raft ensemble
(server nodes) and
> client
> > > use
> > > > > >>> raft
> > > > > >>> >> > >> ensemble only as lock service.
> > > > > >>> >> > >>
> > > > > >>> >> > >> PacificA suggests internal heartbeats
mechanism for
> > failure
> > > > > >>> >> detection of
> > > > > >>> >> > >> replicated group, but it says
nothing about initial
> > > discovery
> > > > > of
> > > > > >>> >> nodes.
> > > > > >>> >> > >>
> > > > > >>> >> > >> WDYT?
> > > > > >>> >> > >>
> > > > > >>> >> > >> [1] -- https://www.consul.io/docs/architecture/gossip
> > > > > >>> >> > >> [2] -- https://www.serf.io/
> > > > > >>> >> > >>
> > > > > >>> >> > >> чт, 19 нояб. 2020 г. в
12:46, Alexey Goncharuk <
> > > > > >>> >> > >> alexey.goncharuk@gmail.com>:
> > > > > >>> >> > >>
> > > > > >>> >> > >>> Following up the Ignite 3.0
scope/development approach
> > > > > threads,
> > > > > >>> >> this is
> > > > > >>> >> > >>> a separate thread to discuss
technical aspects of the
> > IEP.
> > > > > >>> >> > >>>
> > > > > >>> >> > >>> Let's reiterate one more
time on the questions raised
> by
> > > > Ivan
> > > > > >>> and
> > > > > >>> >> also
> > > > > >>> >> > >>> see if there are any other
thoughts on the IEP:
> > > > > >>> >> > >>>
> > > > > >>> >> > >>>    - *Whether to deploy metastorage
on a separate
> subset
> > > of
> > > > > the
> > > > > >>> >> nodes
> > > > > >>> >> > >>>    or allow Ignite to choose
these nodes
> > automatically.* I
> > > > > >>> think it
> > > > > >>> >> is
> > > > > >>> >> > >>>    feasible to maintain both
modes: by default, Ignite
> > > will
> > > > > >>> choose
> > > > > >>> >> > >>>    metastorage nodes automatically
which essentially
> > will
> > > > > >>> provide
> > > > > >>> >> the
> > > > > >>> >> > same
> > > > > >>> >> > >>>    seamless user experience
as TCP discovery SPI - no
> > > > separate
> > > > > >>> >> roles,
> > > > > >>> >> > >>>    simplistic deployment.
For deployments where people
> > > want
> > > > to
> > > > > >>> have
> > > > > >>> >> > more
> > > > > >>> >> > >>>    fine-grained control over
the nodes' assignments,
> we
> > > will
> > > > > >>> >> provide a
> > > > > >>> >> > runtime
> > > > > >>> >> > >>>    configuration which will
allow pinning metastorage
> > > group
> > > > to
> > > > > >>> >> certain
> > > > > >>> >> > nodes,
> > > > > >>> >> > >>>    thus eliminating the latency
concerns.
> > > > > >>> >> > >>>    - *Whether there are any
TLA+ specs for the
> PacificA
> > > > > >>> protocol.*
> > > > > >>> >> Not
> > > > > >>> >> > >>>    to my knowledge, but it
is known to be used in
> > > production
> > > > > by
> > > > > >>> >> > Microsoft and
> > > > > >>> >> > >>>    other projects, e.g. [1]
> > > > > >>> >> > >>>
> > > > > >>> >> > >>> I would like to collect general
feedback on the IEP,
> as
> > > well
> > > > > as
> > > > > >>> >> > feedback
> > > > > >>> >> > >>> on specific parts of it,
such as:
> > > > > >>> >> > >>>
> > > > > >>> >> > >>>    - Metastorage API
> > > > > >>> >> > >>>    - Any existing library
that can be used to avoid
> > > > > >>> re-implementing
> > > > > >>> >> the
> > > > > >>> >> > >>>    protocol ourselves? Perhaps,
porting the existing
> > > > > >>> implementation
> > > > > >>> >> to
> > > > > >>> >> > Java
> > > > > >>> >> > >>>    (the way TiKV did with
etcd-raft [2] [3]? This is a
> > > very
> > > > > >>> neat way
> > > > > >>> >> > btw in my
> > > > > >>> >> > >>>    opinion because I like
the finite automata-like
> > > approach
> > > > of
> > > > > >>> the
> > > > > >>> >> > replication
> > > > > >>> >> > >>>    module, and, additionally,
we could sync bug fixes
> > and
> > > > > >>> >> improvements
> > > > > >>> >> > from
> > > > > >>> >> > >>>    the upstream project)
> > > > > >>> >> > >>>
> > > > > >>> >> > >>>
> > > > > >>> >> > >>> Thanks,
> > > > > >>> >> > >>> --AG
> > > > > >>> >> > >>>
> > > > > >>> >> > >>> [1]
> > > > > >>> >> > >>>
> > > > > >>> >>
> > > > >
> > https://cwiki.apache.org/confluence/display/INCUBATOR/PegasusProposal
> > > > > >>> >> > >>> [2] https://github.com/etcd-io/etcd/tree/master/raft
> > > > > >>> >> > >>> [3] https://github.com/tikv/raft-rs
> > > > > >>> >> > >>>
> > > > > >>> >> > >>
> > > > > >>> >> > >>
> > > > > >>> >> > >> --
> > > > > >>> >> > >> Sincerely yours, Ivan Daschinskiy
> > > > > >>> >> > >>
> > > > > >>> >> > >>
> > > > > >>> >> > >> --
> > > > > >>> >> > >> Sincerely yours, Ivan Daschinskiy
> > > > > >>> >> > >>
> > > > > >>> >> > >
> > > > > >>> >> > >
> > > > > >>> >> > > --
> > > > > >>> >> > > Sincerely yours, Ivan Daschinskiy
> > > > > >>> >> > >
> > > > > >>> >> >
> > > > > >>> >> >
> > > > > >>> >> > --
> > > > > >>> >> > Sincerely yours, Ivan Daschinskiy
> > > > > >>> >> >
> > > > > >>> >>
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > --
> > > > > >>> > Sincerely yours, Ivan Daschinskiy
> > > > > >>> >
> > > > > >>>
> > > > > >>>
> > > > > >>> --
> > > > > >>> Sincerely yours, Ivan Daschinskiy
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours, Ivan Daschinskiy
> > > >
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message