hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Vary (JIRA)" <>
Subject [jira] [Commented] (HIVE-21305) LLAP: Option to skip cache for ETL queries
Date Mon, 25 Feb 2019 11:41:00 GMT


Peter Vary commented on HIVE-21305:

 * Read through cache - ok - got it :)
 * Consider the following query:
insert into ETL_1 values
    select, fact.value, dim.value from fact, dim where;
We might want to cache the dim table, since that might be reused in another query, but we
might not want to cache the fact table.

 * Small tables vs. big tables cache: I might be wrong but my assumption was that reading
files has some constant access time like overhead and then a size based reading time. If my
assumption is correct we might be better of caching the small tables (provided they are
reused later) since this can save us the constant access time. Since they would have
smaller memory footprint we can store more of them in the cache, so the size is not that
much of a factor.

Disclaimer: All of that above is based only on limited data - you have more experience here


> LLAP: Option to skip cache for ETL queries
> ------------------------------------------
>                 Key: HIVE-21305
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: llap
>    Affects Versions: 4.0.0
>            Reporter: Prasanth Jayachandran
>            Priority: Major
> To avoid ETL queries from polluting the cache, would be good to detect such queries at
compile time and optional skip llap io for such queries. 
> org.apache.hadoop.hive.ql.parse.QBParseInfo.hasInsertTables() is the simplest way  to
catch ETL queries.

This message was sent by Atlassian JIRA

View raw message