ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Plehanov <plehanov.a...@gmail.com>
Subject Re: [DISCUSS] Page replacement improvement
Date Fri, 20 Nov 2020 09:05:34 GMT
Zhenya,

> Alexey, we already have changes that partially fixes this issue [1]
IGNITE-13086 it's a minor improvement. We still have major problems with
our page replacement algorithm (slow page selection and non-optimal
page-fault rate). I think changing from random 5 pages to 7 will make
things even worse (it's better for page-fault rate, but page selection will
be slower).

> This approach still not applicable for real life
Why do you think batch replacement is not applicable for real-life? It can
be applied for workloads, where some big amount of data periodically used,
but not very often. For example, when OLAP request over historical data
raised pages to page-memory, and after such request this data is not needed
for a long time. Or when OLTP transactions mostly add new data and process
recent data but rarely touch historical data. In these cases with the
current approach, we will enter "page replacement mode" after some period
of time and never leave it. With batch page replacement there is a chance
to prevent random-LRU page replacement or postpone it.

> But request once more, do you really observe such problems with 2.9 ver ?
Any graphs maybe ?
I don't have production usage feedback after IGNITE-13086, but I doubt
something changed significantly.


чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
>:

>
> Alexey, we already have changes that partially fixes this issue [1]
> Easy way:
> Looks like we already have converge in page replacement.
> If we change 5 times touch iterator from random lru algo into, for
> example — 7 we will obtain fast improvement from scratch.
>
> » Batch page replacement
> This approach still not applicable for real life if you wan`t to observe
> ugly people for threshold (i.e. 12 h) interval. And, of course, you
> understand that dramatically reduce of such interval gives nothing?
>
> » Change the page replacement algorithm.
> That`s way i vote for ) But request once more, do you really observe such
> problems with 2.9 ver ? Any graphs maybe ?
>
> thanks !
>
> [1] https://issues.apache.org/jira/browse/IGNITE-13086
> >Hello, Igniters!
> >
> >Currently, for page replacement (page rotation between page-memory and
> >disk) we use Random-LRU algorithm. It has a low maintenance cost and
> >relatively simple implementation, but it has many disadvantages and
> affects
> >performance very much when replacement is started. We even have warnings
> in
> >the log when page replacement started and a special event for this. I know
> >Ignite deployments where administrators force to restart cluster nodes
> >periodically to avoid page replacement.
> >
> >I have a couple of proposals to improve page replacement in Ignite:
> >
> >*Batch page replacement.*
> >
> >Main idea: in some cases start background task to evict cold pages from
> >page-memory (for example, pages, last touched more than 12 hours ago).
> >
> >The task can be started:
> >- Automatically, triggered by some events, for example, when we expect a
> >start of Random-LRU page replacing soon (allocated more than 90% of
> >page-memory) + we have enough amount of cold pages (we need some metric to
> >calculate the number of cold pages) + some time passed since last batch
> >page replacement (to avoid too much resource consumption by background
> >batch replacement).
> >- Manually (JMX or control.sh), if an administrator wants to control the
> >time of batch replacement more precisely (for example, to avoid the start
> >of this task during peak time).
> >
> >Batch page replacement will be helpful in some workloads (when some data
> >much colder than another), it can prevent the starting of Random-LRU page
> >replacement, or if Random-LRU already started it can provide conditions to
> >stop it.
> >
> >*Change the page replacement algorithm.*
> >
> >Good page replacement algorithm should satisfy the requirements:
> >- low page-fault rates for typical workload
> >- low maintenance cost (low resource consumption to maintain additional
> >structures required for page replacement)
> >- fast searching of next page for replacement
> >- sequential scans resistance (one sequential scan should not evict all
> >relatively hot pages from page-memory)
> >
> >Our Random-LRU has low maintenance cost and sequential scan resistant, but
> >to find the next page for replacement in the best case we scan 5 pages, in
> >the worst case we can scan all data region segment. Also, due to random
> >nature, it's not very effective in predicting the right page for
> >replacement to minimize the page-fault rate. And it's much time required
> to
> >totally evict old cold data.
> >
> >Usually, database management systems and operating systems use
> >modifications of LRU algorithms. These algorithms have higher maintenance
> >costs (pages list should be modified on each page access), but often they
> >are effective from a "page-fault rate" point of view and have O(1)
> >complexity for a searching page to replace. Simple LRU is not sequential
> >scan resistant, but modifications that utilize page access frequency are
> >resistant to sequential scan.
> >
> >We can try one of the modifications of LRU as well (for example,
> "segmented
> >LRU" seems suitable for Ignite).
> >
> >Ignite is a memory-centric product, so "low maintenance cost" is very
> >critical. And there is a risk that page replacement algorithm can affect
> >workloads, where page replacement is not used (enough RAM to store all
> >data). Of course, any page replacement solution should be carefully
> >benchmarked.
> >
> >
> >Igniters, WDYT? If any of these proposals look reasonable to you, I will
> >create IEP and start implementation.
> >
> >Also, I have a draft implementation of system view to determine how hot
> are
> >pages in page-memory [1]. I think it will be useful for any of these
> >approaches (and even if we decide to left page replacement as is).
> >
> >[1]:  https://issues.apache.org/jira/browse/IGNITE-13726
> >
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message