tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TEZ-3577) DefaultSorter doesn't compute RLE properly
Date Sat, 14 Jan 2017 04:17:26 GMT
Ming Ma created TEZ-3577:

             Summary: DefaultSorter doesn't compute RLE properly
                 Key: TEZ-3577
                 URL: https://issues.apache.org/jira/browse/TEZ-3577
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Ming Ma

RLE is enabled if sameKeyCount is above certain threshold. However, sameKeyCount is computed
during sorter.sort. Thus when the following function is invoked by flush for the only spill,
the passed parameter sameKeyCount is 0 given no sort has happened yet. After sorter.sort is
called, DefaultSorter#sameKey is updated and should be used to pass to the spill function.

  protected void sortAndSpill(long sameKeyCount, long totalKeysCount)
      throws IOException, InterruptedException {
    final int mstart = getMetaStart();
    final int mend = getMetaEnd();
    sorter.sort(this, mstart, mend, progressable);
    spill(mstart, mend, sameKeyCount, totalKeysCount);

This message was sent by Atlassian JIRA

View raw message