drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3973) Add profiling for the time spent in metadata operations and planning
Date Fri, 23 Oct 2015 23:27:27 GMT
Aman Sinha created DRILL-3973:

             Summary: Add profiling for the time spent in metadata operations and planning
                 Key: DRILL-3973
                 URL: https://issues.apache.org/jira/browse/DRILL-3973
             Project: Apache Drill
          Issue Type: Improvement
          Components: Metadata, Query Planning & Optimization
    Affects Versions: 1.2.0
            Reporter: Aman Sinha
            Assignee: Mehant Baid

In order to determine where time is spent during metadata operations and query planning (which
includes partition pruning) we need to add more profiling: 
  - time to read the parquet metadata from the parquet files is already logged but the same
needs to be done when the metadata is read from the cache file.
  - the analysis of whether a column is a candidate partition column by comparing the min/max
values should be profiled. 
  - ParquetGroupScan.init() needs some finer granularity timings
  - The places where getFileStatusList() is called needs to be profiled since this is an expensive
operation for large number of files (hundreds of thousands). 
  - PruneScanRule:  currently the profile timings are for each batch of files.  Need to do
finer grained where interpreter evaluation of the filter, analysis of the filter condition
etc. are collected. 
  - Add instrumentation around the places where affinity analysis is done. 

Such profiling is needed to understand excessively long planning times when large number of
files are present. 

This message was sent by Atlassian JIRA

View raw message