hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-20380) LLAP cache should cache small buffers more efficiently
Date Sat, 25 Aug 2018 00:59:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-20380:
------------------------------------
    Description: 
Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum (instead of
256Kb), then after we moved metadata cache off-heap, the index streams that are all tiny take
up a lot of CBs and waste space. 
Wasted space can require larger cache and lead to cache OOMs on some workloads.
Reducing min.alloc solves this problem, but then there's a lot of heap (and probably compute)
overhead to track all these buffers. Arguably even the 4Kb min.alloc is too small.

The initial idea was to store multiple CBs per block, however this is a lot of work all over
the place (cache mapping, cache lookups, everywhere in the readers, etc.). 
Consolidating and reducing allocation sizes after we know the "real" size after decompression
is the new idea (see comments) that can be confined to the allocator and is also more flexible
- no dependence on cache map, so we don't need to make sure stuff is contiguous and such (for
example, R_I streams that we want to consolidate are interleaved with large bloom filters,
that we don't want to read or consolidate when they are not needed - but cache key structure
depends on offsets, so we'd need a new cache map for R_I and separate logic for these streams).
Also streams like PRESENT with one small CB cannot be combined with anything realistically
speaking, but shrinking the allocation will help them.

There are also minor heap improvements possible.
1) Intern tracking tag.
2) Replace AtomicLong object with a long and unsafe CAS method, we only ever use one method,
compareAndSwap.

One more idea is making tracking less object oriented, in particular passing around integer
indexes instead of objects and storing state in giant arrays somewhere (potentially with some
optimizations for less common things), instead of every buffers getting its own object. 

cc [~gopalv] [~prasanth_j]



  was:
Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum (instead of
256Kb), then after we moved metadata cache off-heap, the index streams that are all tiny take
up a lot of CBs and waste space. 
Wasted space can require larger cache and lead to cache OOMs on some workloads.
Reducing min.alloc solves this problem, but then there's a lot of heap (and probably compute)
overhead to track all these buffers. Arguably even the 4Kb min.alloc is too small.

The initial idea was to store multiple CBs per block, however this is a lot of work all over
the place (cache mapping, cache lookups, everywhere in the readers, etc.). 
Consolidating and reducing allocation sizes after we know the "real" size after decompression
is the new idea (see comments).

There are also minor heap improvements possible.
1) Intern tracking tag.
2) Replace AtomicLong object with a long and unsafe CAS method, we only ever use one method,
compareAndSwap.

One more idea is making tracking less object oriented, in particular passing around integer
indexes instead of objects and storing state in giant arrays somewhere (potentially with some
optimizations for less common things), instead of every buffers getting its own object. 

cc [~gopalv] [~prasanth_j]




> LLAP cache should cache small buffers more efficiently
> ------------------------------------------------------
>
>                 Key: HIVE-20380
>                 URL: https://issues.apache.org/jira/browse/HIVE-20380
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Major
>
> Lately ORC CBs are becoming ridiculously small. First there's the 4Kb minimum (instead
of 256Kb), then after we moved metadata cache off-heap, the index streams that are all tiny
take up a lot of CBs and waste space. 
> Wasted space can require larger cache and lead to cache OOMs on some workloads.
> Reducing min.alloc solves this problem, but then there's a lot of heap (and probably
compute) overhead to track all these buffers. Arguably even the 4Kb min.alloc is too small.
> The initial idea was to store multiple CBs per block, however this is a lot of work all
over the place (cache mapping, cache lookups, everywhere in the readers, etc.). 
> Consolidating and reducing allocation sizes after we know the "real" size after decompression
is the new idea (see comments) that can be confined to the allocator and is also more flexible
- no dependence on cache map, so we don't need to make sure stuff is contiguous and such (for
example, R_I streams that we want to consolidate are interleaved with large bloom filters,
that we don't want to read or consolidate when they are not needed - but cache key structure
depends on offsets, so we'd need a new cache map for R_I and separate logic for these streams).
Also streams like PRESENT with one small CB cannot be combined with anything realistically
speaking, but shrinking the allocation will help them.
> There are also minor heap improvements possible.
> 1) Intern tracking tag.
> 2) Replace AtomicLong object with a long and unsafe CAS method, we only ever use one
method, compareAndSwap.
> One more idea is making tracking less object oriented, in particular passing around integer
indexes instead of objects and storing state in giant arrays somewhere (potentially with some
optimizations for less common things), instead of every buffers getting its own object. 
> cc [~gopalv] [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message