ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhenya Stanilovsky <arzamas...@mail.ru.INVALID>
Subject Re[2]: [DISCUSS] Page replacement improvement
Date Fri, 20 Nov 2020 10:31:52 GMT

>> Alexey, we already have changes that partially fixes this issue [1]
>IGNITE-13086 it's a minor improvement. We still have major problems with
>our page replacement algorithm (slow page selection and non-optimal
>page-fault rate). I think changing from random 5 pages to 7 will make
>things even worse (it's better for page-fault rate, but page selection will
>be slower).
All this words above need to be proven, i hope. + 1 with Nikolay, we need correct reproduces
or some graphs from 2.9 ver.
>> This approach still not applicable for real life
>Why do you think batch replacement is not applicable for real-life? It can
>be applied for workloads, where some big amount of data periodically used,
>but not very often. For example, when OLAP request over historical data
>raised pages to page-memory, and after such request this data is not needed
>for a long time. Or when OLTP transactions mostly add new data and process
>recent data but rarely touch historical data. In these cases with the
>current approach, we will enter "page replacement mode" after some period
>of time and never leave it. With batch page replacement there is a chance
>to prevent random-LRU page replacement or postpone it.
I understand case you described, but who will pull the switch ? Human, artificial intelligence
You approach assume some triggering from inner, i don`t like this.  
>> But request once more, do you really observe such problems with 2.9 ver ?
>Any graphs maybe ?
>I don't have production usage feedback after IGNITE-13086, but I doubt
>something changed significantly.
Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes yardstik bench for PR proven,
we can use it once more.
Thanks !
>чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky < arzamas123@mail.ru.invalid
>> Alexey, we already have changes that partially fixes this issue [1]
>> Easy way:
>> Looks like we already have converge in page replacement.
>> If we change 5 times touch iterator from random lru algo into, for
>> example — 7 we will obtain fast improvement from scratch.
>> » Batch page replacement
>> This approach still not applicable for real life if you wan`t to observe
>> ugly people for threshold (i.e. 12 h) interval. And, of course, you
>> understand that dramatically reduce of such interval gives nothing?
>> » Change the page replacement algorithm.
>> That`s way i vote for ) But request once more, do you really observe such
>> problems with 2.9 ver ? Any graphs maybe ?
>> thanks !
>> [1]  https://issues.apache.org/jira/browse/IGNITE-13086
>> >Hello, Igniters!
>> >
>> >Currently, for page replacement (page rotation between page-memory and
>> >disk) we use Random-LRU algorithm. It has a low maintenance cost and
>> >relatively simple implementation, but it has many disadvantages and
>> affects
>> >performance very much when replacement is started. We even have warnings
>> in
>> >the log when page replacement started and a special event for this. I know
>> >Ignite deployments where administrators force to restart cluster nodes
>> >periodically to avoid page replacement.
>> >
>> >I have a couple of proposals to improve page replacement in Ignite:
>> >
>> >*Batch page replacement.*
>> >
>> >Main idea: in some cases start background task to evict cold pages from
>> >page-memory (for example, pages, last touched more than 12 hours ago).
>> >
>> >The task can be started:
>> >- Automatically, triggered by some events, for example, when we expect a
>> >start of Random-LRU page replacing soon (allocated more than 90% of
>> >page-memory) + we have enough amount of cold pages (we need some metric to
>> >calculate the number of cold pages) + some time passed since last batch
>> >page replacement (to avoid too much resource consumption by background
>> >batch replacement).
>> >- Manually (JMX or control.sh), if an administrator wants to control the
>> >time of batch replacement more precisely (for example, to avoid the start
>> >of this task during peak time).
>> >
>> >Batch page replacement will be helpful in some workloads (when some data
>> >much colder than another), it can prevent the starting of Random-LRU page
>> >replacement, or if Random-LRU already started it can provide conditions to
>> >stop it.
>> >
>> >*Change the page replacement algorithm.*
>> >
>> >Good page replacement algorithm should satisfy the requirements:
>> >- low page-fault rates for typical workload
>> >- low maintenance cost (low resource consumption to maintain additional
>> >structures required for page replacement)
>> >- fast searching of next page for replacement
>> >- sequential scans resistance (one sequential scan should not evict all
>> >relatively hot pages from page-memory)
>> >
>> >Our Random-LRU has low maintenance cost and sequential scan resistant, but
>> >to find the next page for replacement in the best case we scan 5 pages, in
>> >the worst case we can scan all data region segment. Also, due to random
>> >nature, it's not very effective in predicting the right page for
>> >replacement to minimize the page-fault rate. And it's much time required
>> to
>> >totally evict old cold data.
>> >
>> >Usually, database management systems and operating systems use
>> >modifications of LRU algorithms. These algorithms have higher maintenance
>> >costs (pages list should be modified on each page access), but often they
>> >are effective from a "page-fault rate" point of view and have O(1)
>> >complexity for a searching page to replace. Simple LRU is not sequential
>> >scan resistant, but modifications that utilize page access frequency are
>> >resistant to sequential scan.
>> >
>> >We can try one of the modifications of LRU as well (for example,
>> "segmented
>> >LRU" seems suitable for Ignite).
>> >
>> >Ignite is a memory-centric product, so "low maintenance cost" is very
>> >critical. And there is a risk that page replacement algorithm can affect
>> >workloads, where page replacement is not used (enough RAM to store all
>> >data). Of course, any page replacement solution should be carefully
>> >benchmarked.
>> >
>> >
>> >Igniters, WDYT? If any of these proposals look reasonable to you, I will
>> >create IEP and start implementation.
>> >
>> >Also, I have a draft implementation of system view to determine how hot
>> are
>> >pages in page-memory [1]. I think it will be useful for any of these
>> >approaches (and even if we decide to left page replacement as is).
>> >
>> >[1]:  https://issues.apache.org/jira/browse/IGNITE-13726
>> >
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message