drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5636) External sort should not copy data prior to spilling
Date Fri, 30 Jun 2017 05:11:00 GMT
Paul Rogers created DRILL-5636:

             Summary: External sort should not copy data prior to spilling
                 Key: DRILL-5636
                 URL: https://issues.apache.org/jira/browse/DRILL-5636
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.8.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
            Priority: Minor

The external sort spills data to disk under memory pressure. The sort code uses a generic
mechanism to do the spilling:

* Use a "priority queue copier" to copy sorted records into a new batch
* Spill the new batch by writing the vectors for the newly-created batch

The above works fine when memory is plentiful. But, under low-memory conditions, the intermediate
copy can cause OOM errors.

An improved algorithm is:

* Priority queue copier works vector-by-vector
* Serialize each vector to disk
* Release its memory
* Repeat for the next vector

The advantages of the above:

* Less intermediate memory use
* Perhaps better CPU cache performance through greater locality (all writes happen to a single
vector at a time, rather than row by row)
* No change in disk format or disk write performance (because data is buffered prior to write

This message was sent by Atlassian JIRA

View raw message