kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Why does kudu not perform delete in the delta compaction?
Date Thu, 18 Aug 2016 18:19:42 GMT
On Thu, Aug 18, 2016 at 7:05 AM, fidel zheng <fidel.zheng@gmail.com> wrote:

> I just read the paper of kudu. I have a question about the delta
> compaction.
>
> Any given live row is in exactly one rowset, so the delete of this row is
> in the delta file of the same rowset. When the maintains process do the
> delta compaction, it can perform the delete. Why not?
>

The issue is that delta compaction typically does not rewrite all of the
columns. For example, consider a schema like:

CREATE TABLE users (
  user_id PRIMARY KEY,
  address string,
  biography string,
  phone_number string,
  last_login_ts int64
);

All of the non-PK fields might be updated, but the 'last_login_ts' will be
updated much more frequently. The major delta compaction process uses
update counts by column to see, in this case, that the last_login_ts column
is the only one that needs to be compacted (because the others likely
haven't received updates). This saves a lot of IO, especially in cases like
this, because the 'last_login_ts' column is likely to be much smaller than
other columns such as 'biography' or 'address'.

To get back to your question, then: if we are compacting only a subset of
columns, it wouldn't be possible to garbage-collect a deleted row. Consider
the data:


tlipcon            "433 California St"    12345     [UPDATE: last_login =
22345]
deleted_user  "210 Portage Ave"     54321   [DELETED]
other_user     "151 W 26th St"         90000

and we want to major-compact the 'last_login' column. Since we only read
and re-write that column, imagine what would happen if we processed the
delete:


tlipcon            "433 California St"    22345
deleted_user  "210 Portage Ave"     90000
other_user     "151 W 26th St"           ??????

The column that we compacted would now be "too short", and later column
values would end up shifted upwards into non-corresponding rows.

Hope that helps
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message