hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Akash R Nilugal (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-9753) Cache Pre-Priming
Date Fri, 16 Aug 2019 05:39:00 GMT
Akash R Nilugal created YARN-9753:

             Summary: Cache Pre-Priming
                 Key: YARN-9753
                 URL: https://issues.apache.org/jira/browse/YARN-9753
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Akash R Nilugal

Currently, we have an index server which basically helps in distributed caching of the datamaps
in a separate spark application.

The caching of the datamaps in index server will start once the query is fired on the table
for the first time, all the datamaps will be loaded

if the count(*) is fired and only required will be loaded for any filter query.

Here the problem or the bottleneck is, until and unless the query is fired on table, the caching
won’t be done for the table datamaps.

So consider a scenario where we are just loading the data to table for whole day and then
next day we query,

so all the segments will start loading into cache. So first time the query will be slow.

What if we load the datamaps into cache or preprime the cache without waititng for any query
on the table?

Yes, what if we load the cache after every load is done, what if we load the cache for all
the segments at once,

so that first time query need not do all this job, which makes it faster.

Here i have attached the design document for the pre-priming of cache into index server. Please
have a look at it

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

View raw message