drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] paul-rogers commented on a change in pull request #2026: DRILL-7330: Implement metadata usage for all format plugins
Date Sat, 14 Mar 2020 19:37:14 GMT
paul-rogers commented on a change in pull request #2026: DRILL-7330: Implement metadata usage
for all format plugins
URL: https://github.com/apache/drill/pull/2026#discussion_r392607688
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/physical/base/AbstractGroupScanWithMetadata.java
 ##########
 @@ -634,6 +642,62 @@ public NonInterestingColumnsMetadata getNonInterestingColumnsMetadata()
{
     return nonInterestingColumnsMetadata;
   }
 
+  /**
+   * Returns {@link TableMetadataProviderBuilder} instance based on specified
+   * {@link MetadataProviderManager} source.
+   *
+   * @param source metadata provider manager
+   * @return {@link TableMetadataProviderBuilder} instance
+   */
+  protected abstract TableMetadataProviderBuilder tableMetadataProviderBuilder(MetadataProviderManager
source);
+
+  /**
+   * Returns {@link TableMetadataProviderBuilder} instance which may provide metadata
+   * without using Drill Metastore.
+   *
+   * @param source metadata provider manager
+   * @return {@link TableMetadataProviderBuilder} instance
+   */
+  protected abstract TableMetadataProviderBuilder defaultTableMetadataProviderBuilder(MetadataProviderManager
source);
+
+  /**
+   * Compares the last modified time of files obtained from specified selection with
+   * the Metastore last modified time to determine whether Metastore metadata
+   * is not outdated. If metadata is outdated, {@link MetadataException} will be thrown.
+   *
+   * @param selection the source of files to check
+   * @throws MetadataException if metadata is outdated
+   */
+  protected void checkMetadataConsistency(FileSelection selection, Configuration fsConf)
throws IOException {
+    if (metadataProvider.checkMetadataVersion()) {
+      DrillFileSystem fileSystem =
+          ImpersonationUtil.createFileSystem(ImpersonationUtil.resolveUserName(getUserName()),
fsConf);
+
+      List<FileStatus> fileStatuses = FileMetadataInfoCollector.getFileStatuses(selection,
fileSystem);
+
+      long lastModifiedTime = metadataProvider.getTableMetadata().getLastModifiedTime();
+
+      Set<Path> removedFiles = new HashSet<>(metadataProvider.getFilesMetadataMap().keySet());
+      Set<Path> newFiles = new HashSet<>();
+
+      boolean isChanged = false;
+
+      for (FileStatus fileStatus : fileStatuses) {
+        if (!removedFiles.remove(Path.getPathWithoutSchemeAndAuthority(fileStatus.getPath())))
{
+          newFiles.add(fileStatus.getPath());
+        }
+        if (fileStatus.getModificationTime() > lastModifiedTime) {
+          isChanged = true;
+          break;
+        }
+      }
 
 Review comment:
   The above will be a very costly operation for millions of files. Do we have a way that
an external system can ping us when a file is added? Or, that Drill can work with the files
available until an external system tells us to refresh? And, can we do that refresh in parallel
with the current metadata so that queries continue to run with the existing metadata as we
gather the new set?
   
   Again, imagine the case that @dobesv recently explained: millions of files, data constantly
arriving.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message