drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (DRILL-5022) ExternalSortBatch sets two different limits for "copier" memory
Date Fri, 17 Feb 2017 22:22:44 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Paul Rogers resolved DRILL-5022.
    Resolution: Fixed

> ExternalSortBatch sets two different limits for "copier" memory
> ---------------------------------------------------------------
>                 Key: DRILL-5022
>                 URL: https://issues.apache.org/jira/browse/DRILL-5022
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
> The {{ExternalSortBatch}} (ESB) operator sorts rows and supports spilling to disk to
operate within a set memory budget.
> A key step in disk-based sorting is to merge "runs" of previously-sorted records. ESB
does this with a class created from the {{PriorityQueueCopierTemplate}}, called the "copier"
in the code.
> The sort runs are represented by record batches, each with an indirection vector (AKA
{{SelectionVector}}) that point to the records in sort order.
> The copier restructures the incoming runs: copying from the original batches (from positions
given by the indirection vector) into new output vectors in sorted order. To do this work,
the copier must allocate new vectors to hold the merged data. These vectors consume memory,
and must fit into the overall memory budget assigned to the ESB.
> As it turns out, the ESB code has two conflicting ways of setting the limit. One is hard-coded:
> {code}
>   private static final int COPIER_BATCH_MEM_LIMIT = 256 * 1024;
> {code}
> The other comes from config parameters:
> {code}
>   public static final long INITIAL_ALLOCATION = 10_000_000;
>   public static final long MAX_ALLOCATION = 20_000_000;
>     copierAllocator = oAllocator.newChildAllocator(oAllocator.getName() + ":copier",
>         PriorityQueueCopier.INITIAL_ALLOCATION, PriorityQueueCopier.MAX_ALLOCATION);
> {code}
> Strangely, the config parameters are used to set aside memory for the copier to use.
But, the {{COPIER_BATCH_MEM_LIMIT}} is used to determine how large of a merged batch to actually
> The result is that we set aside 10 MB of memory, but use only 256K of it, wasting 9 MB.
> This ticket asks to:
> * Determine the proper merged batch size.
> * Use that limit to set the memory allocation for the copier.
> Elsewhere in Drill batch sizes tend to be on the order of 32K records. In the ESB, the
low {{COPIER_BATCH_MEM_LIMIT}} tends to favor smaller batches: A test case has a row width
of 114 bytes, and produces batches of just 2299 records. So, likely the proper choice is the
larger 10 MB memory allocator limit.

This message was sent by Atlassian JIRA

View raw message