ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Mekhanikov <dmekhani...@gmail.com>
Subject Re: Asynchronous registration of binary metadata
Date Wed, 14 Aug 2019 16:49:18 GMT

I still don’t understand completely if by using metastore we are going to stop using discovery
for metadata registration, or not. Could you clarify that point?
Is it going to be a distributed metastore or a local one?

Are there any relevant JIRA tickets for this change?


> On 14 Aug 2019, at 19:37, Alexei Scherbakov <alexey.scherbakoff@gmail.com> wrote:
> Denis Mekhanikov,
> 1. Yes, only on OS failures. In such case data will be received from alive
> nodes later.
> 2. Yes, for walmode=FSYNC writes to metastore will be slow. But such mode
> should not be used if you have more than two nodes in grid because it has
> huge impact on performance.
> ср, 14 авг. 2019 г. в 14:29, Denis Mekhanikov <dmekhanikov@gmail.com>:
>> Folks,
>> Thanks for showing interest in this issue!
>> Alexey,
>>> I think removing fsync could help to mitigate performance issues with
>> current implementation
>> Is my understanding correct, that if we remove fsync, then discovery won’t
>> be blocked, and data will be flushed to disk in background, and loss of
>> information will be possible only on OS failure? It sounds like an
>> acceptable workaround to me.
>> Will moving metadata to metastore actually resolve this issue? Please
>> correct me if I’m wrong, but we will still need to write the information to
>> WAL before releasing the discovery thread. If WAL mode is FSYNC, then the
>> issue will still be there. Or is it planned to abandon the discovery-based
>> protocol at all?
>> Evgeniy, Ivan,
>> In my particular case the data wasn’t too big. It was a slow virtualised
>> disk with encryption, that made operations slow. Given that there are 200
>> nodes in a cluster, where every node writes slowly, and this process is
>> sequential, one piece of metadata is registered extremely slowly.
>> Ivan, answering to your other questions:
>>> 2. Do we need a persistent metadata for in-memory caches? Or is it so
>> accidentally?
>> It should be checked, if it’s safe to stop writing marshaller mappings to
>> disk without loosing any guarantees.
>> But anyway, I would like to have a property, that would control this. If
>> metadata registration is slow, then initial cluster warmup may take a
>> while. So, if we preserve metadata on disk, then we will need to warm it up
>> only once, and further restarts won’t be affected.
>>> Do we really need a fast fix here?
>> I would like a fix, that could be implemented now, since the activity with
>> moving metadata to metastore doesn’t sound like a quick one. Having a
>> temporary solution would be nice.
>> Denis
>>> On 14 Aug 2019, at 11:53, Павлухин Иван <vololo100@gmail.com>
>>> Denis,
>>> Several clarifying questions:
>>> 1. Do you have an idea why metadata registration takes so long? So
>>> poor disks? So many data to write? A contention with disk writes by
>>> other subsystems?
>>> 2. Do we need a persistent metadata for in-memory caches? Or is it so
>>> accidentally?
>>> Generally, I think that it is possible to move metadata saving
>>> operations out of discovery thread without loosing required
>>> consistency/integrity.
>>> As Alex mentioned using metastore looks like a better solution. Do we
>>> really need a fast fix here? (Are we talking about fast fix?)
>>> ср, 14 авг. 2019 г. в 11:45, Zhenya Stanilovsky
>> <arzamas123@mail.ru.invalid>:
>>>> Alexey, but in this case customer need to be informed, that whole (for
>> example 1 node) cluster crash (power off) could lead to partial data
>> unavailability.
>>>> And may be further index corruption.
>>>> 1. Why your meta takes a substantial size? may be context leaking ?
>>>> 2. Could meta be compressed ?
>>>>> Среда, 14 августа 2019, 11:22 +03:00 от Alexei Scherbakov
>> alexey.scherbakoff@gmail.com>:
>>>>> Denis Mekhanikov,
>>>>> Currently metadata are fsync'ed on write. This might be the case of
>>>>> slow-downs in case of metadata burst writes.
>>>>> I think removing fsync could help to mitigate performance issues with
>>>>> current implementation until proper solution will be implemented:
>> moving
>>>>> metadata to metastore.
>>>>> вт, 13 авг. 2019 г. в 17:09, Denis Mekhanikov < dmekhanikov@gmail.com
>>> :
>>>>>> I would also like to mention, that marshaller mappings are written
>> disk
>>>>>> even if persistence is disabled.
>>>>>> So, this issue affects purely in-memory clusters as well.
>>>>>> Denis
>>>>>>> On 13 Aug 2019, at 17:06, Denis Mekhanikov < dmekhanikov@gmail.com
>>>>>> wrote:
>>>>>>> Hi!
>>>>>>> When persistence is enabled, binary metadata is written to disk
>>>>>> registration. Currently it happens in the discovery thread, which
>> makes
>>>>>> processing of related messages very slow.
>>>>>>> There are cases, when a lot of nodes and slow disks can make
>>>>>> binary type be registered for several minutes. Plus it blocks
>> processing of
>>>>>> other messages.
>>>>>>> I propose starting a separate thread that will be responsible
>>>>>> writing binary metadata to disk. So, binary type registration will
>>>>>> considered finished before information about it will is written to
>> disks on
>>>>>> all nodes.
>>>>>>> The main concern here is data consistency in cases when a node
>>>>>> acknowledges type registration and then fails before writing the
>> metadata
>>>>>> to disk.
>>>>>>> I see two parts of this issue:
>>>>>>> Nodes will have different metadata after restarting.
>>>>>>> If we write some data into a persisted cache and shut down nodes
>> faster
>>>>>> than a new binary type is written to disk, then after a restart we
>> won’t
>>>>>> have a binary type to work with.
>>>>>>> The first case is similar to a situation, when one node fails,
>> after
>>>>>> that a new type is registered in the cluster. This issue is resolved
>> by the
>>>>>> discovery data exchange. All nodes receive information about all
>> binary
>>>>>> types in the initial discovery messages sent by other nodes. So,
>> you
>>>>>> restart a node, it will receive information, that it failed to finish
>>>>>> writing to disk, from other nodes.
>>>>>>> If all nodes shut down before finishing writing the metadata
to disk,
>>>>>> then after a restart the type will be considered unregistered, so
>> another
>>>>>> registration will be required.
>>>>>>> The second case is a bit more complicated. But it can be resolved
>>>>>> making the discovery threads on every node create a future, that
>> be
>>>>>> completed when writing to disk is finished. So, every node will have
>> such
>>>>>> future, that will reflect the current state of persisting the
>> metadata to
>>>>>> disk.
>>>>>>> After that, if some operation needs this binary type, it will
need to
>>>>>> wait on that future until flushing to disk is finished.
>>>>>>> This way discovery threads won’t be blocked, but other threads,
>>>>>> actually need this type, will be.
>>>>>>> Please let me know what you think about that.
>>>>>>> Denis
>>>>> --
>>>>> Best regards,
>>>>> Alexei Scherbakov
>>>> --
>>>> Zhenya Stanilovsky
>>> --
>>> Best regards,
>>> Ivan Pavlukhin
> -- 
> Best regards,
> Alexei Scherbakov

View raw message