drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacques Nadeau (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3907) Enable ScanBatch to provide a merge on pre-sorted readers
Date Wed, 07 Oct 2015 16:15:27 GMT
Jacques Nadeau created DRILL-3907:

             Summary: Enable ScanBatch to provide a merge on pre-sorted readers
                 Key: DRILL-3907
                 URL: https://issues.apache.org/jira/browse/DRILL-3907
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Other
            Reporter: Jacques Nadeau
             Fix For: 1.3.0

In some situations, individual record readers will be presorted by a key. Given Drill's approach
to parallelization, it is possible that a Single ScanBatch will be interacting with multiple
readers. If we want to maintain the collation of the underlying data, we need to Drill to
do a n-way merge on the streams as they are read into Drill. This functionality already exists
in the MergingReceiver. 

This JIRA is to refactor merging receiver so that the underlying N-Way merge of batches can
be used in other locations. We then need to decide whether to incorporate it directly into
the ScanBatch (when needed) or to do something external. We also need to resolve how we decide
whether the collation that could be provided by utilizing an n-way merge is necessary (to
avoid paying the cost of maintaining an unused collation).

This message was sent by Atlassian JIRA

View raw message