ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: memory-only mode for Ignite indexes
Date Mon, 07 May 2018 09:09:19 GMT
Hi Dima,

Update with indexes would definitely be slower than update without them.
The question is how much slower. For now the slowdown comes mostly from
excessive data page reads ([1] and [2] in my previous email) leading to
page evictions and additional IO. To the contrast, usually only a single
page write is needed to update an index. Correct index implementation ([1]
and [2] from previous email) would eliminate data page reads altogether and
should give dramatic speedup.

On Mon, May 7, 2018 at 10:58 AM, Dmitriy Setrakyan <dsetrakyan@apache.org>
wrote:

> Vladimir, my comments are inline...
>
> On Sat, May 5, 2018 at 6:12 AM, Vladimir Ozerov <vozerov@gridgain.com>
> wrote:
>
>> In general I do not support this initiative. There are two serious reasons
>> for that:
>> 1) Our indexes are slow on updates due to architectural flaws. First,
>> every
>> index entry must be of fixed size. For this reason we cannot inline full
>> values in general case and suffer from data page lookups [1]. Second,
>> final
>> comparisons always compare primary keys, so another lookup is needed [2].
>> Third, our indexes are fat because we are lacking prefix compression [3].
>>
>
> These all seem like great optimization and we should definitely do them.
> However, I am of the strong opinion that even after these optimizations,
> the data ingestion speed will be much slower with the persistence turned
> on. Am I wrong?
>
>
>> 2) Some vendors do have memory-only indexes - SQL Server, Couchbase,
>> MemSQL, to name a few. But they are memory optimized - no pages, no
>> BTrees.
>> Lock-free skiplist is used instead. This is correct design which really
>> fast. But we are very far from it at the moment.
>>
>
> I have not heard complaints about our BTree indexes being slow in memory.
> I only hear complaints about the slow-downs whenever the persistence is
> turned on and users are ingesting large amounts of data.
>
>
>> Taking this in count I would not consider memory-only BTree indexes in the
>> nearest future. Instead, we should focus on performance. When mentioned
>> things are fixed/implemented, our indexes will be both memory-efficient
>> and
>> very fast to update.
>>
>
> I would agree with you only if there is no performance boost in the short
> term. So far, disabling persistence for indexes seems like a very simple
> change, but could render a significant performance boost.
>
>
>>
>> [1]
>> https://issues.apache.org/jira/browse/IGNITE-8385
>> [2]
>> https://issues.apache.org/jira/browse/IGNITE-8384
>> [3]
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-20%
>> 3A+Data+Compression+in+Ignite#IEP-20:DataCompressioninIgnite
>> -IndexPrefixCompression
>>
>> сб, 5 мая 2018 г. в 3:46, Dmitriy Setrakyan <dsetrakyan@apache.org>:
>>
>> > Igniters,
>> >
>> > One of the main complaints I hear from users is that whenever the
>> > persistence is turned on, index creation can really slow down the
>> > performance, because of massive amounts of writes to disk. The reason
>> > Ignite is writing indexes to disk is to support fast restarts - nothing
>> > needs to be rebuilt on startup, and Ignite can become operational right
>> > away.
>> >
>> > However, as far as I can tell, most users care about faster operations
>> > after the system is started and much less about the startup speed. What
>> if
>> > we added a mode where we do not persist indexes at all? This way data
>> > ingestion and overall throughput will significantly increase (of
>> course, at
>> > the cost of startup type getting longer because we have to rebuild the
>> > indexes).
>> >
>> > There are 2 ways to achieve this in Ignite. The simplest way is not mark
>> > index pages dirty in memory, so they will never participate in
>> > check-pointing process. We also have to make sure that index pages never
>> > get evicted form memory. This can be done fairly quickly. The
>> disadvantage
>> > of this approach is that if indexes fill up most of the memory, then it
>> > will be very difficult to find a page to evict, which may hurt the
>> > performance.
>> >
>> > The other way is to have a separate in-memory off-heap region for
>> indexes.
>> > This region should never be persisted. It maybe somewhat bigger
>> > refactoring, as we currently do not separate between index and data
>> pages.
>> > However, the advantage of this approach is that this region can be
>> flushed
>> > to disk practically as is during a graceful shutdown of the node, and
>> hence
>> > shorten the restart time.
>> >
>> > I think we should start from the 1st approach and then think about the
>> 2nd
>> > one. What do you think?
>> >
>> > D.
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message