ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@gridgain.com>
Subject Re: Service grid redesign
Date Thu, 05 Apr 2018 19:13:36 GMT
>
> There is no need to deserialize services on the coordinator. It should only
> be able to calculate the assignments.
> *LazyServiceConfiguration *should be used to deliver the service
> configurations, just like it is done right now.


Can that configuration be tweaked over the time requiring to update the
class on all the nodes (if, for instance, someone wants to deploy the next
version of a service)? Just want to be sure we don't need to restart the
cluster nodes (that won't be used for service deployments) on
services-related configurational changes.

--
Denis

On Thu, Apr 5, 2018 at 8:18 AM, Denis Mekhanikov <dmekhanikov@gmail.com>
wrote:

> Denis,
> There is no need to deserialize services on the coordinator. It should only
> be able to calculate the assignments.
> *LazyServiceConfiguration *should be used to deliver the service
> configurations, just like it is done right now.
>
> Val,
> Usage of DeploymentSpi is a good idea, I didn't think about this
> possibility.
> This is a viable alternative to peer-class-loading, not that user-friendly
> though.
> But if peer-class-loading is that hard to implement, then I vote for
> DeploymentSpi.
> As far as I understand, it won't require us to do any additional changes in
> Ignite, but will make users think about using a proper DeploymentSpi.
> Please correct me, if I'm wrong.
> It would be good, though, to add some examples on service redeployment,
> when implementation class changes.
>
> Denis
>
> чт, 5 апр. 2018 г. в 2:33, Valentin Kulichenko <
> valentin.kulichenko@gmail.com>:
>
> > I don't think peer class loading is even possible for services. I believe
> > we should reuse DeploymentSpi [1] for versioning.
> >
> > [1] https://apacheignite.readme.io/docs/deployment-spi
> >
> > -Val
> >
> > On Wed, Apr 4, 2018 at 12:52 PM, Denis Magda <dmagda@gridgain.com>
> wrote:
> >
> > > Sorry, that was me who renamed the IEP to "Oil Change in Service Grid".
> > Was
> > > writing this email after the renaming. Like that title more because
> it's
> > > fun and highlights what we're intended to do - cleaning of our service
> > grid
> > > engine and powering it up with new "liquid" (new communication and
> > > deployment approach not available before).
> > >
> > > Denis
> > >
> > >
> > > > This message contains serialized service instance and its
> > configuration.
> > > > It is delivered to the coordinator node first, that calculates the
> > > service
> > > > deployment assignments and adds this information to the message.
> > >
> > >
> > > I would consider using a NodeFilter first to decide where a service can
> > be
> > > potentially deployed.  Otherwise, we would require service classes to
> be
> > on
> > > every node (every node might become a coordinator) which is not the
> > desired
> > > requirement.
> > >
> > >
> > > As for the peer-class-loading, I would backup up Dmitriy here. Let's at
> > > least not to focus on this task for now. We should design services
> > > versioning in the right way first and support it.
> > >
> > > --
> > > Denis
> > >
> > >
> > >
> > > On Wed, Apr 4, 2018 at 12:20 PM, Dmitriy Setrakyan <
> > dsetrakyan@apache.org>
> > > wrote:
> > >
> > > > Here is the correct link:
> > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > 17%3A+Oil+Change+in+Service+Grid
> > > >
> > > > I have looked at the tickets there, and I believe that we should not
> > > > support peer-deployment for services. It is very hard and I do not
> > think
> > > we
> > > > should even try.
> > > >
> > > > I am proposing closing this ticket as Won't Fix -
> > > > https://issues.apache.org/jira/browse/IGNITE-975
> > > >
> > > > D.
> > > >
> > > > On Wed, Apr 4, 2018 at 5:39 AM, Denis Mekhanikov <
> > dmekhanikov@gmail.com>
> > > > wrote:
> > > >
> > > > > Vyacheslav,
> > > > >
> > > > > I've just posted my first draft of the IEP:
> > > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > > > 17%3A+Service+grid+
> > > > > improvements
> > > > > It's not finished yet, but you can get the idea from it.
> > > > > If you have some thoughts on your mind, please let me know, I'll
> add
> > > them
> > > > > to the IEP.
> > > > >
> > > > > Denis
> > > > >
> > > > > ср, 4 апр. 2018 г. в 13:09, Vyacheslav Daradur <
> daradurvs@gmail.com
> > >:
> > > > >
> > > > > > Denis, thanks for the link.
> > > > > >
> > > > > > I looked through the task and I think that understand your
> redesign
> > > > point
> > > > > > now.
> > > > > >
> > > > > > Do you have a clear plan or IEP for the whole redesign?
> > > > > >
> > > > > > I'm interested in this component and I'd like to take part in
the
> > > > > > development.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Apr 2, 2018 at 2:55 PM, Denis Mekhanikov <
> > > > dmekhanikov@gmail.com>
> > > > > > wrote:
> > > > > > > Vyacheslav,
> > > > > > >
> > > > > > > Service deployment design, based on replicated utility
cache
> has
> > > > proven
> > > > > > to
> > > > > > > be unstable and deadlock-prone.
> > > > > > > You can find a list of JIRA issues, connected to it, in
my
> > previous
> > > > > > letter.
> > > > > > >
> > > > > > > The intention behind it is similar to the binary metadata
> > redesign,
> > > > > that
> > > > > > > happened in the following ticket: IGNITE-4157
> > > > > > > <https://issues.apache.org/jira/browse/IGNITE-4157>
> > > > > > > This change in service deployment procedure will eliminate
need
> > for
> > > > > > another
> > > > > > > internal replicated cache
> > > > > > > and make service deployment more reliable on unstable topology.
> > > > > > >
> > > > > > > Denis
> > > > > > >
> > > > > > > вт, 27 мар. 2018 г. в 23:21, Vyacheslav Daradur
<
> > > daradurvs@gmail.com
> > > > >:
> > > > > > >
> > > > > > >> Hi, Denis Mekhanikov!
> > > > > > >>
> > > > > > >> As far as I know, Ignite services are based on IgniteCache
and
> > we
> > > > have
> > > > > > >> all its features. We can use listeners or continuous
queries
> for
> > > > > > >> deployment synchronizations.
> > > > > > >>
> > > > > > >> Why do you want using the discovery layer for that?
> > > > > > >>
> > > > > > >> One more thing: we can use baseline approach for services,
> that
> > > > means
> > > > > > >> *IgniteService.deploy()* returns ready to work service
after
> > > > > > >> deployment on baseline nodes and deploy to other nodes
on
> > demand,
> > > > for
> > > > > > >> example when deployed service's loading will be hight.
> > > > > > >>
> > > > > > >> About versioning, maybe there is sense to extend public
API:
> > > > > > >> IgniteServices.service(name, *version*)?
> > > > > > >>
> > > > > > >> At first deployment, we can compute service's hashcode
(just
> for
> > > an
> > > > > > >> example) and store it, after new deployment request
for
> services
> > > > with
> > > > > > >> an existing name we will compute new service's hashcode
and
> > > compare
> > > > > > >> them if they have different hashcodes that we will
deploy new
> > > > service
> > > > > > >> as service with a different version.
> > > > > > >>
> > > > > > >>
> > > > > > >> On Fri, Mar 23, 2018 at 10:03 PM, Denis Magda <
> > dmagda@apache.org>
> > > > > > wrote:
> > > > > > >> > Denis,
> > > > > > >> >
> > > > > > >> > Thanks for the extensive analysis. There is a
vast room for
> > > > > > optimizations
> > > > > > >> > on the service grid side.
> > > > > > >> >
> > > > > > >> > Yakov, Sam, Alex G.,
> > > > > > >> >
> > > > > > >> > How do you like the idea of the usage of discovery
protocol
> > for
> > > > the
> > > > > > >> service
> > > > > > >> > grid system messages exchange? Any pitfalls?
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > --
> > > > > > >> > Denis
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Fri, Mar 23, 2018 at 8:01 AM, Denis Mekhanikov
<
> > > > > > dmekhanikov@gmail.com
> > > > > > >> >
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> >> Igniters,
> > > > > > >> >>
> > > > > > >> >> I'd like to start a discussion on Ignite service
grid
> > redesign.
> > > > > > >> >> We have a number of problems in our current
architecture,
> > that
> > > > have
> > > > > > to
> > > > > > >> be
> > > > > > >> >> addressed.
> > > > > > >> >>
> > > > > > >> >> Here are the most severe ones:
> > > > > > >> >>
> > > > > > >> >> One of them is lack of guarantee, that service
is
> > successfully
> > > > > > deployed
> > > > > > >> and
> > > > > > >> >> ready for work by the time, when *IgniteService.deploy*()*
> > > > methods
> > > > > > >> return.
> > > > > > >> >> Furthermore, if an exception is thrown from
*Service.init()
> > > > > *method,
> > > > > > >> then
> > > > > > >> >> the deploying side is not able to receive
it, or even
> > > understand,
> > > > > > that
> > > > > > >> >> service is in unusable state.
> > > > > > >> >> So, you may end up in such situation, when
you deployed a
> > > service
> > > > > > >> without
> > > > > > >> >> receiving any errors, then called a service's
method, and
> > hung
> > > > > > >> indefinitely
> > > > > > >> >> on this invocation.
> > > > > > >> >> JIRA ticket:
> > https://issues.apache.org/jira/browse/IGNITE-3392
> > > > > > >> >>
> > > > > > >> >> Another problem is locking during service
deployment on
> > > unstable
> > > > > > >> topology.
> > > > > > >> >> This issue is caused by missing updates in
continuous query
> > > > > > listeners on
> > > > > > >> >> the internal cache.
> > > > > > >> >> It is hard to reproduce, but it happens sometimes.
We
> > shouldn't
> > > > > allow
> > > > > > >> such
> > > > > > >> >> possibility, that deployment methods hang
without saying
> > > > anything.
> > > > > > >> >> JIRA ticket:
> > https://issues.apache.org/jira/browse/IGNITE-6259
> > > > > > >> >>
> > > > > > >> >> I think, we should change the deployment procedure
to make
> it
> > > > more
> > > > > > >> >> reliable.
> > > > > > >> >> Moving from operating over internal replicated
service
> cache
> > to
> > > > > > sending
> > > > > > >> >> custom discovery events seems to be a good
idea.
> > > > > > >> >> Service deployment may trigger a discovery
event, that will
> > > make
> > > > > > chosen
> > > > > > >> >> nodes deploy the service, and the same event
will notify
> > other
> > > > > nodes
> > > > > > >> about
> > > > > > >> >> the deployed service instances.
> > > > > > >> >> It will eliminate the need for distributed
transactions on
> > the
> > > > > > internal
> > > > > > >> >> replicated system cache, and make the service
deployment
> > > protocol
> > > > > > more
> > > > > > >> >> transparent.
> > > > > > >> >>
> > > > > > >> >> There are a few points, that should be taken
into account
> > > though.
> > > > > > >> >>
> > > > > > >> >> First of all, we can't wait for services to
be deployed and
> > > > > > initialised
> > > > > > >> in
> > > > > > >> >> the discovery thread.
> > > > > > >> >> So, we need to make notification about service
deployment
> > > result
> > > > > > >> >> asynchronous, presumably over communication
protocol.
> > > > > > >> >> I can think of a procedure similar to the
current exchange
> > > > > protocol,
> > > > > > >> when
> > > > > > >> >> service deployment is initialised with an
initial discovery
> > > > > message,
> > > > > > >> >> followed by asynchronous notifications from
the hosting
> > servers
> > > > > over
> > > > > > >> >> communication. And finally, one more discovery
message will
> > > > notify
> > > > > > all
> > > > > > >> >> nodes about the service deployment result
and location of
> the
> > > > > > deployed
> > > > > > >> >> service instances. Coordinator will be responsible
for
> > > collecting
> > > > > of
> > > > > > the
> > > > > > >> >> deployment results in this scheme.
> > > > > > >> >>
> > > > > > >> >> Another problem is failover in case, when
some nodes fail
> > > during
> > > > > > >> deployment
> > > > > > >> >> or further work.
> > > > > > >> >> The following cases should be handled:
> > > > > > >> >>
> > > > > > >> >>    1. coordinator failure during deployment;
> > > > > > >> >>    2. failure of nodes, that were chosen to
host the
> service,
> > > > > during
> > > > > > >> >>    deployment;
> > > > > > >> >>    3. failure of nodes, that contain deployed
services,
> after
> > > the
> > > > > > >> >>    deployment.
> > > > > > >> >>
> > > > > > >> >> The first case may be resolved by either continuation
of
> > > > deployment
> > > > > > >> with a
> > > > > > >> >> new coordinator, or by cancelling it.
> > > > > > >> >> The second case will require another node
to be chosen and
> > > > > notified.
> > > > > > >> Maybe
> > > > > > >> >> another discovery message will be needed.
> > > > > > >> >> The third case will require redeployment,
so coordinator
> > should
> > > > > track
> > > > > > >> >> topology changes and redeploy failed services.
> > > > > > >> >>
> > > > > > >> >> Another good improvement would be service
versioning. This
> > > matter
> > > > > was
> > > > > > >> >> already discussed in another thread:
> > > > > > >> >>
> > > > > > >>
> > > > > > http://apache-ignite-developers.2346864.n4.nabble.
> > > > > com/Service-versioning-
> > > > > > >> >> td20858.html
> > > > > > >> >> Let's resume this discussion and state the
final decision
> > here.
> > > > > > >> >> This feature is closely connected to peer
class loading,
> > which
> > > is
> > > > > not
> > > > > > >> >> working for services currently.
> > > > > > >> >> So, service versioning should be implemented
along with
> peer
> > > > class
> > > > > > >> loading.
> > > > > > >> >> JIRA ticket for versioning:
> > > > > > >> >> https://issues.apache.org/jira/browse/IGNITE-6069
> > > > > > >> >> Peer class loading: https://issues.apache.org/
> > > > > jira/browse/IGNITE-975
> > > > > > >> >>
> > > > > > >> >> Please share your thoughts. Constructive criticism
is
> highly
> > > > > > >> appreciated.
> > > > > > >> >>
> > > > > > >> >> Denis
> > > > > > >> >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Best Regards, Vyacheslav D.
> > > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best Regards, Vyacheslav D.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message