tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [tez] okumin opened a new pull request #79: TEZ-4246[WIP]: Avoid uneven local disk usage for spills
Date Tue, 10 Nov 2020 05:54:44 GMT

okumin opened a new pull request #79:
URL: https://github.com/apache/tez/pull/79


   https://issues.apache.org/jira/browse/TEZ-4246
   
   In case that there are just two disks, the current implementation is likely to use one
of them to write spill data and the other one to store the index files. All `file.out`, bigger
than `file.out.index`, are written on the same disk.
   
   1. write spill data on `/data/0/..../file.out`
   2. write a spill index file on the other directory, `/data/1/.../file.out.index`
   3. write spill data on `/data/0/..../file.out`
   4. ...
   
   This PR would change the behavior so as to utilize both disks more proportionally.
   
   1. write spill data on `/data0/..../file.out`
   2. write the spill index file on the same directory, `/data/0/.../file.out
   3. write spill data on `/data1/..../file.out`
   4. ...
   
   Index files are relatively small and I think it's reasonable to put it on the same directory
as `file.out`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message