drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] amansinha100 commented on a change in pull request #1738: DRILL-7062: Initial implementation of run-time row-group pruning
Date Mon, 08 Apr 2019 02:31:52 GMT
amansinha100 commented on a change in pull request #1738: DRILL-7062: Initial implementation
of run-time row-group pruning
URL: https://github.com/apache/drill/pull/1738#discussion_r272868355
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
 ##########
 @@ -149,6 +219,77 @@ protected ScanBatch getBatch(ExecutorFragmentContext context, AbstractParquetRow
     return new ScanBatch(context, oContext, readers, implicitColumns);
   }
 
+  /**
+   *  Create a reader and add it to the list of readers.
+   *
+   * @param context
+   * @param rowGroupScan
+   * @param oContext
+   * @param columnExplorer
+   * @param readers
+   * @param implicitColumns
+   * @param mapWithMaxColumns
+   * @param rowGroup
+   * @param fs
+   * @param footer
+   * @param readSchemaOnly - if true sets the number of rows to read to be zero
+   * @return
+   */
+  private Map<String, String> createReaderAndImplicitColumns(ExecutorFragmentContext
context,
+                                                             AbstractParquetRowGroupScan
rowGroupScan,
+                                                             OperatorContext oContext,
+                                                             ColumnExplorer columnExplorer,
+                                                             List<RecordReader> readers,
+                                                             List<Map<String, String>>
implicitColumns,
+                                                             Map<String, String> mapWithMaxColumns,
+                                                             RowGroupReadEntry rowGroup,
+                                                             DrillFileSystem fs,
+                                                             ParquetMetadata footer,
+                                                             boolean readSchemaOnly) {
+    ParquetReaderConfig readerConfig = rowGroupScan.getReaderConfig();
+    ParquetReaderUtility.DateCorruptionStatus containsCorruptDates = ParquetReaderUtility.detectCorruptDates(footer,
+      rowGroupScan.getColumns(), readerConfig.autoCorrectCorruptedDates());
+    logger.debug("Contains corrupt dates: {}.", containsCorruptDates);
+
+    boolean useNewReader = context.getOptions().getBoolean(ExecConstants.PARQUET_NEW_RECORD_READER);
+    boolean containsComplexColumn = ParquetReaderUtility.containsComplexColumn(footer, rowGroupScan.getColumns());
+    logger.debug("PARQUET_NEW_RECORD_READER is {}. Complex columns {}.", useNewReader ? "enabled"
: "disabled",
+        containsComplexColumn ? "found." : "not found.");
+    RecordReader reader;
+
+    if (useNewReader || containsComplexColumn) {
+      reader = new DrillParquetReader(context,
+          footer,
+          rowGroup,
+          columnExplorer.getTableColumns(),
+          fs,
+          containsCorruptDates);
+    } else {
+      reader = new ParquetRecordReader(context,
+          rowGroup.getPath(),
+          rowGroup.getRowGroupIndex(),
+          rowGroup.getNumRecordsToRead(), // if readSchemaOnly - then set to zero rows to
read (currently breaks the ScanBatch)
 
 Review comment:
   The `readSchemaOnly` is not getting used, so why even pass it to this function ?  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message