drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] Ben-Zvi commented on a change in pull request #1738: DRILL-7062: Initial implementation of run-time row-group pruning
Date Mon, 08 Apr 2019 03:59:59 GMT
Ben-Zvi commented on a change in pull request #1738: DRILL-7062: Initial implementation of
run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r272879356
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##########
 @@ -149,6 +219,77 @@ protected ScanBatch getBatch(ExecutorFragmentContext context, AbstractParquetRow
     return new ScanBatch(context, oContext, readers, implicitColumns);
   }
 
+  /**
+   *  Create a reader and add it to the list of readers.
+   *
+   * @param context
+   * @param rowGroupScan
+   * @param oContext
+   * @param columnExplorer
+   * @param readers
+   * @param implicitColumns
+   * @param mapWithMaxColumns
+   * @param rowGroup
+   * @param fs
+   * @param footer
+   * @param readSchemaOnly - if true sets the number of rows to read to be zero
+   * @return
+   */
+  private Map<String, String> createReaderAndImplicitColumns(ExecutorFragmentContext
context,
+                                                             AbstractParquetRowGroupScan
rowGroupScan,
+                                                             OperatorContext oContext,
+                                                             ColumnExplorer columnExplorer,
+                                                             List<RecordReader> readers,
+                                                             List<Map<String, String>>
implicitColumns,
+                                                             Map<String, String> mapWithMaxColumns,
+                                                             RowGroupReadEntry rowGroup,
+                                                             DrillFileSystem fs,
+                                                             ParquetMetadata footer,
+                                                             boolean readSchemaOnly) {
+    ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
+    ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = ParquetReaderUtility.detectCorruptDates(footer,
+      rowGroupScan.getColumns(), readerConfig.autoCorrectCorruptedDates());
+    logger.debug("Contains corrupt dates: {}.", containsCorruptDates);
+
+    boolean useNewReader = context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER);
+    boolean containsComplexColumn = ParquetReaderUtility.containsComplexColumn(footer, rowGroupScan.getColumns());
+    logger.debug("PARQUET_NEW_RECORD_READER is {}. Complex columns {}.", useNewReader ? "enabled"
: "disabled",
+        containsComplexColumn ? "found." : "not found.");
+    RecordReader reader;
+
+    if (useNewReader || containsComplexColumn) {
+      reader = new DrillParquetReader(context,
+          footer,
+          rowGroup,
+          columnExplorer.getTableColumns(),
+          fs,
+          containsCorruptDates);
+    } else {
+      reader = new ParquetRecordReader(context,
+          rowGroup.getPath(),
+          rowGroup.getRowGroupIndex(),
+          rowGroup.getNumRecordsToRead(), // if readSchemaOnly - then set to zero rows to
read (currently breaks the ScanBatch)
 
 Review comment:
   This was a last minute change (before it used this flag to set the num-rows-to-read to
zero).
   Soon (another Jira) we should retry implementing this option, so left that flag in the
code as a reminder (maybe should be just a TODO comment)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message