drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (DRILL-385) Implement Top-N sort operator
Date Tue, 04 Mar 2014 17:00:24 GMT

     [ https://issues.apache.org/jira/browse/DRILL-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jacques Nadeau reassigned DRILL-385:
------------------------------------

    Assignee: Steven Phillips

> Implement Top-N sort operator
> -----------------------------
>
>                 Key: DRILL-385
>                 URL: https://issues.apache.org/jira/browse/DRILL-385
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Steven Phillips
>            Assignee: Steven Phillips
>         Attachments: DRILL-385.patch
>
>
> I want to give a brief summary of the design for this operator.
> This operator maintains a hyperbatch and a SelectionVector4. The length of the selectionVector
is (limit + 1). Indices [0..limit-1] are used to maintain a min-heap, with the value pointing
to the batch and index of the value it represents. The last element of the selectionVector
is used for staging the next incoming record. It is done this way because the generated methods
for doing comparisons uses the values stored in the selection vector as a pointer to the records
in the hyperbatch.
> The first N records to come in are added to the heap, and the heap property is reestablished
after each insertion. Once the heap has reached it's final size, each incoming record is compared
to the top of the heap. The heap is a min-heap, meaning the root of the heap contains the
lowest priority item in the heap. In other words, if we are looking for the top 10 values,
the root points to the current 10th value. Each incoming record is compared to the root, and
if necessary the two are swapped, and the heap property restored.
> Periodically, there can be a purge, in which the values referenced by the selection vector
are copied into new value vectors, and the old buffers are released, freeing up memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message