drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] dvjyothsna commented on a change in pull request #1723: DRILL-7063: Seperate metadata cache file into summary, file metadata
Date Sun, 07 Apr 2019 22:20:17 GMT
dvjyothsna commented on a change in pull request #1723: DRILL-7063: Seperate metadata cache
file into summary, file metadata
URL: https://github.com/apache/drill/pull/1723#discussion_r272854637
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata.java
 ##########
 @@ -149,20 +157,25 @@ public static ParquetTableMetadata_v3 getParquetTableMetadata(Map<FileStatus,
Fi
    * Get the parquet metadata for the table by reading the metadata file
    *
    * @param fs current file system
-   * @param path The path to the metadata file, located in the directory that contains the
parquet files
+   * @param paths The path to the metadata file, located in the directory that contains the
parquet files
    * @param metaContext metadata context
    * @param readerConfig parquet reader configuration
    * @return parquet table metadata. Null if metadata cache is missing, unsupported or corrupted
    */
   public static @Nullable ParquetTableMetadataBase readBlockMeta(FileSystem fs,
-                                                                 Path path,
+                                                                 List<Path> paths,
                                                                  MetadataContext metaContext,
                                                                  ParquetReaderConfig readerConfig)
{
-    if (ignoreReadingMetadata(metaContext, path)) {
-      return null;
-    }
     Metadata metadata = new Metadata(readerConfig);
-    metadata.readBlockMeta(path, false, metaContext, fs);
+    if (paths.isEmpty()) {
+      metaContext.setMetadataCacheCorrupted(true);
+    }
+    for (Path path: paths) {
+      if (ignoreReadingMetadata(metaContext, path)) {
+        return null;
 
 Review comment:
   Lets take this scenario where the summary file is corrupted but the file metadata file
is intact. If we pull out the ignoreReadingMetadata (), we do this check of ignoring only
before reading the summary file. But after reading the corrupted summary file, metadatacachecorrupted
will be set to true. Reading of filemetadata will not be skipped if we don't check metadatacachecorrupted
status. Regarding the performance  if the Paths[1] is corrupt, we will get to know that it
is corrupt only in the readBlockMeta(). So even pulling out ignoreReadingMetadata() doesn't
help this case. And always Paths[0] has summary and since summary is not very big it wouldn't
impact the performance a lot.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message