ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ткаленко кирилл <tkalkir...@yandex.ru>
Subject Re: [DISCUSSION] Cache warmup
Date Wed, 05 Aug 2020 06:37:56 GMT
Hi, Stas!

After talking with Anton and Alexy about "IP40", I changed description of implementation in
form of a response to Slava, here [1]. In short, I made three separate interfaces, first public
for strategy configuration, second internal for strategy implementation, and third for possible
delivery of strategies from different plugins.

I will try to think about this and implement it. Warm-up phase will be up to "discovery" and
while I'm not sure that it will be possible to connect via control.sh, perhaps it will be
possible via jmx, but I think it will be better via control.sh
> Will there be a way to interrupt warmup phase and continue startup (e.g. via JMX, REST
and/or control.sh)? Can we have it please?

I was thinking about how and where to make warm-up configuration and I think it would be better
to do it in IgniteConfiguration since each strategy can work for caches, groups, regions,
etc.
> I think that ideally warmup should be configured per-cache - I believe this is what a
user would expect to do.
> However, cache configs are immutable. We need a way for existing users to enjoy the cache
warmup feature, as well as for early adopters to switch to more > > > sophisticated
strategies as they will be released (or as their dataset grows).
> Because of that I propose to add the cache warmup configuration to the DataRegionConfiguration.
Data regions can be changed between restarts, independently > on each node allowing for
a rolling change.

Possible.
> Will preloadPartition() method be deprecated together with this change? I assume yes?

I think it can be done as a new strategy, but this is at discretion of developers.
> How hard would it be to implement a "load all indexes, metapages and freelists" strategy
in addition to the "load everything"?
> I think it would be an MVP for environments with a datasets larger than RAM. A "load
everything" strategy will not work in this environments pretty much at all, 
> and "load indexes" will be a significant improvement to no warmup at all.

[1] - http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Cache-warmup-td48582.html#a48649


04.08.2020, 23:22, "Stanislav Lukyanov" <stanlukyanov@gmail.com>:
> Kirill,
>
> Thanks for driving this. This is awaited by many users.
>
> A few comments and questions.
>
> I would keep CacheWarmup interface purely internal and never view it as an interface
which a user would be implementing.
> There are multiple reasons for that:
> - The logic of the cache warmup is very low-level; how a user is supposed to know which
pages they want?
> - A sophisticated strategy will require accessing private APIs for sure; say, I need
a strategy which loads the last known memory state before the restart; how can I even implement
that without breaking into various internals?
> - In fact there aren't many implementations which make sense ("load everything", "load
indexes", "load last memory state", "load N GB at random"); every use case I've seen would
be solved by a "load everything" strategy (if disk is < RAM) or "load last memory state"
strategy
> - Warmup will be a critical phase, and a custom user implementation is all too likely
to cause issues. We should avoid executing user code in critical stages if we can help it
> To summarize, if we give warmup strategies in users' hands they will be hard to write,
will require breaking into internals or a lot of additional public interfaces for these internals,
will likely cause issues with the cluster, and everyone will be implementing the same few
general strategies.
> Basically, I expect only fellow Ignite developers to be implementing their own strategies.
> Because of that I propose to keep the interfaces private, and only give a single public
parameter. The parameter can take an enum of the supported strategies. New useful strategies
should be added to Ignite codebase.
>
> Will there be a way to interrupt warmup phase and continue startup (e.g. via JMX, REST
and/or control.sh)? Can we have it please?
>
> I think that ideally warmup should be configured per-cache - I believe this is what a
user would expect to do.
> However, cache configs are immutable. We need a way for existing users to enjoy the cache
warmup feature, as well as for early adopters to switch to more sophisticated strategies as
they will be released (or as their dataset grows).
> Because of that I propose to add the cache warmup configuration to the DataRegionConfiguration.
Data regions can be changed between restarts, independently on each node allowing for a rolling
change.
>
> Will preloadPartition() method be deprecated together with this change? I assume yes?
>
> How hard would it be to implement a "load all indexes, metapages and freelists" strategy
in addition to the "load everything"?
> I think it would be an MVP for environments with a datasets larger than RAM. A "load
everything" strategy will not work in this environments pretty much at all, and "load indexes"
will be a significant improvement to no warmup at all.
>
> Thanks,
> Stan
>
>>  On 4 Aug 2020, at 16:04, ткаленко кирилл <tkalkirill@yandex.ru>
wrote:
>>
>>  Hi, Denis!
>>
>>  For now, I suggest a simple warm-up implementation, if the persistent storage is
less than RAM. If others want to make additional implementations, they can do it themselves
by implementing interfaces. For the first point, we need to figure out how and where we will
remember pages, etc. Perhaps for such tasks it will be necessary to make improvements in kernel.
>>
>>  In "WarmUpStrategy#warmUp" method, we get "GridKernalContext#cache" from which
we can get with caches and groups through "GridCacheProcessor#cacheGroups", "GridCacheProcessor#caches"
and so on, we can access to pages.
>>>  The second one requires direct work with data pages, but not with a cache
>>>  context, so it's also impossible to implement.
>>
>>  This requires writing additional custom code, which may run longer due to its SQL
features, and so on.
>>  It would be more convenient to just set a warm-up strategy for both developer and
grid administrator.
>>>  When loading of all cache data is required, it can be done by running a
>>>  local scan query. It will iterate through all data pages and result in
>>>  their allocation in memory.
>>
>>  04.08.2020, 15:25, "Denis Mekhanikov" <dmekhanikov@gmail.com>:
>>>  Kirill,
>>>
>>>  When I discussed this functionality with Ignite users, I heard the
>>>  following thoughts about warming up:
>>>
>>>     - Node restarts affect performance of queries. The main reason for that
>>>     is that the pages that were loaded into memory before the restart are
on
>>>     disk after the restart. It takes time to reach the same distribution
of
>>>     data between memory and disk. Until that point the performance is usually
>>>     degraded. No simple rule like "load everything" helps here if only a
part
>>>     of data fits in memory.
>>>     - It would be nice to have a way to give preferences to indices when
>>>     doing a warmup. Usually indices are used more often than data nodes,
so
>>>     loading indices first would bring more benefits.
>>>
>>>  The first point can be addressed by implementing the policy that would
>>>  restore the memory state that was observed before the restart. I don't see
>>>  how it can be implemented using the suggested interface.
>>>  The second one requires direct work with data pages, but not with a cache
>>>  context, so it's also impossible to implement.
>>>
>>>  When loading of all cache data is required, it can be done by running a
>>>  local scan query. It will iterate through all data pages and result in
>>>  their allocation in memory.
>>>
>>>  So, I don't really see a scenario when the suggested API will help. Do you
>>>  have a suitable use-case that will be covered?
>>>
>>>  Denis
>>>
>>>  вт, 4 авг. 2020 г. в 13:42, ткаленко кирилл <tkalkirill@yandex.ru>:
>>>
>>>>   Hi, Denis!
>>>>
>>>>   Previously, I answered Slava about implementation that I keep in mind,
now
>>>>   it will be possible to add own warm-up strategy implementations. Which
will
>>>>   be possible to implement in different ways.
>>>>
>>>>   At the moment, I suggest implementing one "Load all" strategy, which
will
>>>>   be effective if persistent storage is less than RAM.
>>>>
>>>>   28.07.2020, 19:46, "Denis Mekhanikov" <dmekhanikov@gmail.com>:
>>>>   > Kirill,
>>>>   >
>>>>   > That will be a great feature! Other popular databases already have
it
>>>>   (e.g.
>>>>   > Postgres: https://www.postgresql.org/docs/11/pgprewarm.html), so
it's
>>>>   good
>>>>   > that we're also going to have it in Ignite.
>>>>   >
>>>>   > What implementation of CacheWarmup interface do you have in mind?
Will
>>>>   > there be some preconfigured implementation, and will users be able
to
>>>>   > implement it themselves?
>>>>   >
>>>>   > Do you think it should be cache-based? I would say that a
>>>>   DataRegion-based
>>>>   > warm-up would come more naturally. Page IDs that are loaded into
the data
>>>>   > region can be dumped periodically to disk and recovered on restarts.
This
>>>>   > is more or less how it works in Postgres.
>>>>   > I'm afraid that if we make it cache-based, the implementation won't
be
>>>>   that
>>>>   > obvious. We already have an API for warmup that appeared to be pretty
>>>>   much
>>>>   > impossible to apply in a useful way:
>>>>   >
>>>>   https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/IgniteCache.html#preloadPartition-int-
>>>>   > Let's make sure that our new tool for warming up is actually useful.
>>>>   >
>>>>   > Denis
>>>>   >
>>>>   > вт, 28 июл. 2020 г. в 09:17, Zhenya Stanilovsky
>>>>   <arzamas123@mail.ru.invalid
>>>>   >> :
>>>>   >
>>>>   >> Looks like we need additional func for static caches, for
>>>>   >> example: warmup(List<CacheConfiguration> cconf) it would
be helpful for
>>>>   >> spring too.
>>>>   >>
>>>>   >> >
>>>>   >> >------- Forwarded message -------
>>>>   >> >From: "Вячеслав Коптилин" < slava.koptilin@gmail.com
>
>>>>   >> >To: dev@ignite.apache.org
>>>>   >> >Cc:
>>>>   >> >Subject: Re: [DISCUSSION] Cache warmup
>>>>   >> >Date: Mon, 27 Jul 2020 16:47:48 +0300
>>>>   >> >
>>>>   >> >Hello Kirill,
>>>>   >> >
>>>>   >> >Thanks a lot for driving this activity. If I am not mistaken,
this
>>>>   >> >discussion relates to IEP-40.
>>>>   >> >
>>>>   >> >> I suggest adding a warmup phase after recovery here
[1] after [2],
>>>>   >> before
>>>>   >> >discovery.
>>>>   >> >This means that the user's thread, which starts Ignite via
>>>>   >> >Ignition.start(), will wait for ana additional step - cache
warm-up.
>>>>   >> >I think this fact has to be clearly mentioned in our documentation
(at
>>>>   >> >Javadocat least) because this step can be time-consuming.
>>>>   >> >
>>>>   >> >> I suggest adding a new interface:
>>>>   >> >I would change it a bit. First of all, it would be nice
to place this
>>>>   >> >interface to a public package and get rid of using GridCacheContext,
>>>>   >> >which is an internal class and it should not leak to the
public API
>>>>   in any
>>>>   >> >case.
>>>>   >> >Perhaps, this parameter is not needed at all or we should
add some
>>>>   public
>>>>   >> >abstraction instead of internal class.
>>>>   >> >
>>>>   >> >package org.apache.ignite.configuration;
>>>>   >> >
>>>>   >> >import org.apache.ignite.IgniteCheckedException;
>>>>   >> >import org.apache.ignite.lang.IgniteFuture;
>>>>   >> >
>>>>   >> >public interface CacheWarmupper {
>>>>   >> > /**
>>>>   >> > * Warmup cache.
>>>>   >> > *
>>>>   >> > * @param cachename Cache name.
>>>>   >> > * @return Future cache warmup.
>>>>   >> > * @throws IgniteCheckedException If failed.
>>>>   >> > */
>>>>   >> > IgniteFuture<?> warmup(String cachename) throws
>>>>   >> >IgniteCheckedException;
>>>>   >> >}
>>>>   >> >
>>>>   >> >Thanks,
>>>>   >> >S.
>>>>   >> >
>>>>   >> >пн, 27 июл. 2020 г. в 15:03, ткаленко кирилл
< tkalkirill@yandex.ru
>>>>   >:
>>>>   >> >
>>>>   >> >> Now, after restarting node, we have only cold caches,
which at first
>>>>   >> >> requests to them will gradually load data from disks,
which can slow
>>>>   >> down
>>>>   >> >> first calls to them.
>>>>   >> >> If node has more RAM than data on disk, then they can
be loaded at
>>>>   start
>>>>   >> >> "warmup", thereby solving the issue of slowdowns during
first calls
>>>>   to
>>>>   >> >> caches.
>>>>   >> >>
>>>>   >> >> I suggest adding a warmup phase after recovery here
[1] after [2],
>>>>   >> before
>>>>   >> >> descovery.
>>>>   >> >>
>>>>   >> >> I suggest adding a new interface:
>>>>   >> >>
>>>>   >> >> package org.apache.ignite.internal.processors.cache;
>>>>   >> >>
>>>>   >> >> import org.apache.ignite.IgniteCheckedException;
>>>>   >> >> import org.apache.ignite.internal.IgniteInternalFuture;
>>>>   >> >> import org.jetbrains.annotations.Nullable;
>>>>   >> >>
>>>>   >> >> /**
>>>>   >> >> * Interface for warming up cache.
>>>>   >> >> */
>>>>   >> >> public interface CacheWarmup {
>>>>   >> >> /**
>>>>   >> >> * Warmup cache.
>>>>   >> >> *
>>>>   >> >> * @param cacheCtx Cache context.
>>>>   >> >> * @return Future cache warmup.
>>>>   >> >> * @throws IgniteCheckedException if failed.
>>>>   >> >> */
>>>>   >> >> @Nullable IgniteInternalFuture<?> process(GridCacheContext
cacheCtx)
>>>>   >> >> throws IgniteCheckedException;
>>>>   >> >> }
>>>>   >> >>
>>>>   >> >> Which will allow to warm up caches in parallel and
asynchronously.
>>>>   >> Warmup
>>>>   >> >> phase will end after all IgniteInternalFuture for all
caches isDone.
>>>>   >> >>
>>>>   >> >> Also adding the ability to customize via methods:
>>>>   >> >>
>>>>   >>
>>>>    org.apache.ignite.configuration.IgniteConfiguration#setDefaultCacheWarmup
>>>>   >> >> org.apache.ignite.configuration.CacheConfiguration#setCacheWarmup
>>>>   >> >>
>>>>   >> >> Which will allow for each cache to set implementation
of cache
>>>>   warming
>>>>   >> >> up,
>>>>   >> >> both for a specific cache, and for all if necessary.
>>>>   >> >>
>>>>   >> >> I suggest adding an implementation of SequentialWarmup
that will use
>>>>   >> [3].
>>>>   >> >>
>>>>   >> >> Questions, suggestions, comments?
>>>>   >> >>
>>>>   >> >> [1] -
>>>>   >> >>
>>>>   >>
>>>>    org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#afterLogicalUpdatesApplied
>>>>   >> >> [2] -
>>>>   >> >>
>>>>   >>
>>>>    org.apache.ignite.internal.processors.cache.GridCacheProcessor.CacheRecoveryLifecycle#restorePartitionStates
>>>>   >> >> [3] -
>>>>   >> >>
>>>>   >>
>>>>    org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManager.CacheDataStore#preload

Mime
View raw message