tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@apache.org>
Subject Re: Distributed Cache in Tez
Date Fri, 26 Jul 2013 21:11:05 GMT
Hi Achal 

We want to force folks to use local resources as it makes the users more aware of how to use
the cache. 

Pushing local files to distributed cache for each job does not bring any performance improvement.
All it does is ensure that the local files are now available on the remote node in the cluster
where the task is run. It also requires uploading the local files to hdfs each and every time.
This also means that given that there is a new hdfs file each and every time, the "cache"
on the remote node can be used. 

With local resources, the user is making a conscious choice of first uploading a local file
to hdfs and then adding the hdfs file as a local resource for the remote task. As long as
the file on hdfs remains unchanged, the remote node will re-use the local copy ( local copy
is downloaded once the first time around from hdfs ). With this in mind, a user will be more
mindful of when to upload a local file and how to re-use hdfs-based resources across jobs.
A user would now realize that the penalty of uploading a non-changing jar for each and every
job ( as was done by hive earlier ). 

In the case of helpers, are you looking at a helper method for creating local resources out
of files that change for each and every job? 

Furthermore, there is a question of management of these uploaded files? When should they be
deleted - after the job completes? If yes, is the AM supposed to delete them or the client?
What if a client does not hang around for the job to complete or is killed before it can clean
up the files?   

-- Hitesh

On Jul 26, 2013, at 1:59 PM, Achal Soni wrote:

> Hey all,
> Have any thoughts be given to distributed cache in Tez? It seems that it is
> almost as simple as adding local files to vertices via YARN.
> Is there any insight into how DistributedCache differs from adding
> LocalResources? Should I be looking into MRApps for helper methods?
> Thanks!
> Achal

View raw message