ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: Page replacement policy improvements (when persistent is enabled)
Date Thu, 16 Aug 2018 09:01:25 GMT
Hi Dima,

Putting index pages in separate region is wrong approach, because data
pages may be equally important on certain workloads, especially in
heap-organized databases, such as Ignite.
At the moment we'd better focus on monitoring.to better understand usages
patterns. This would give us solid ground for further decisions.

Vladimir.

On Sat, Aug 4, 2018 at 12:06 AM Dmitriy Setrakyan <dsetrakyan@apache.org>
wrote:

> Vladimir,
>
> Are we only counting timestamp of the last access? In that case, it would
> create a problem. We should also count number of times a page has been
> touched within a certain time frame, e.g. last hour or so. In this case,
> index pages would not be evicted as they get touched the most.
>
> I would also consider putting index pages into a separate memory region.
> This way you can apply a different eviction policy to the index pages or
> decide not to evict them altogether. This will also be a much simpler and
> less error-prone approach than introducing new eviction policies.
>
> D.
>
> On Fri, Aug 3, 2018 at 12:19 AM, Vladimir Ozerov <vozerov@gridgain.com>
> wrote:
>
> > Igniters,
> >
> > I heard some complaints about our page replacement algorithm that index
> > pages could be evicted from memory too often. I reviewed our current
> > implementation and looks like we have choosen very simple approach with
> > eviction of random pages, without taking in count their nature (data vs
> > index) and typical usage patterns (such as scans).
> >
> > With our heap-based architecture typical SQL query is executed as
> follows:
> > 1) Read non-leaf index pages, then in loop:
> > 2.1) Read 1 leaf index page
> > 2.2) Read several hunderds data pages
> >
> > This way index pages on average has smaller timestamp than data pages and
> > has good probabilty of being evicted.
> >
> > Another major problem is scan resistance, which doesn't seem to be
> covered
> > anyhow.
> >
> > My question is - what was the reason of choosing random-pseudo-LRU
> > algorithm instead of commonly used variation of *real* LRU (such as
> LRU-K,
> > 2Q, etc)? Did we perform any evaluation of it's effectiveness?
> >
> > I am thinking of creating new IEP to evaluate and possibly improve our
> page
> > replacement as follows:
> > 1) Implement metrics to count page cache hit/miss by page type [1]
> > 2) Implement *heat map* which can optionally be enabled to track page
> > hits/misses per page or per specific object (cache, index)
> > 3) Run heat map on typical workloads (lookups, scans, joins, etc) to get
> a
> > baseline
> > 4) Prototype several LRU-based implementation and see if they gave any
> > benefit. It makes sense to start with minor improvements to current
> > algorithm (e.g. favor index pages over data pages, play with sample size,
> > replace timestamps with read counters, etc).
> >
> > In any case the first two action items would be good addition to product
> > monitoring.
> >
> > What do you think?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-8580
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message