ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Izhikov <nizhi...@apache.org>
Subject Re: Exchange stucks while node restoring state from WAL
Date Fri, 03 Aug 2018 07:28:06 GMT
Hello, Maxim.

> 1) Is it correct that readMetastore() happens after node starts> but before including
node into the ring?> 

I think yes. 
You can have some kind of metainformation required on node join.

> 5) Does in our final solution for new joined node readMetastore> and restoreMemory
should be performed in one step?

I think, no.

Meta Information can be required to perform restore memory.
So we have to restore metainformation in first step and restore whole memory as a second step.

В Пт, 03/08/2018 в 09:44 +0300, Maxim Muzafarov пишет:
> Hi Igniters,
> 
> 
> I'm working on bug [1] and have some questions about the final
> implementation. Probably, I've already found answers on some of
> them but I want to be sure. Please, help me to clarify details.
> 
> 
> The key problem here is that we are reading WAL and restoring
> memory state of new joined node inside PME. Reading WAL can
> consume huge amount of time, so the whole cluster stucks and
> waits for the single node.
> 
> 
> 1) Is it correct that readMetastore() happens after node starts
> but before including node into the ring?
> 
> 2) Is after onDone() method called for LocalJoinFuture on local
> node happend we can proceed with initiating PME on local node?
> 
> 3) After reading checkpoint and restore memory for new joined
> node how and when we are updating obsolete partitions update
> counter? At historical rebalance, right?
> 
> 4) Should we restoreMemory for new joined node before PME
> initiates on the other nodes in cluster?
> 
> 5) Does in our final solution for new joined node readMetastore
> and restoreMemory should be performed in one step?
> 
> 
> [1] https://issues.apache.org/jira/browse/IGNITE-7196
Mime
View raw message