drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3973) Add profiling for the time spent in metadata operations and planning
Date Fri, 23 Oct 2015 23:27:27 GMT
Aman Sinha created DRILL-3973:
---------------------------------

             Summary: Add profiling for the time spent in metadata operations and planning
                 Key: DRILL-3973
                 URL: https://issues.apache.org/jira/browse/DRILL-3973
             Project: Apache Drill
          Issue Type: Improvement
          Components: Metadata, Query Planning & Optimization
    Affects Versions: 1.2.0
            Reporter: Aman Sinha
            Assignee: Mehant Baid


In order to determine where time is spent during metadata operations and query planning (which
includes partition pruning) we need to add more profiling: 
  - time to read the parquet metadata from the parquet files is already logged but the same
needs to be done when the metadata is read from the cache file.
  - the analysis of whether a column is a candidate partition column by comparing the min/max
values should be profiled. 
  - ParquetGroupScan.init() needs some finer granularity timings
  - The places where getFileStatusList() is called needs to be profiled since this is an expensive
operation for large number of files (hundreds of thousands). 
  - PruneScanRule:  currently the profile timings are for each batch of files.  Need to do
finer grained where interpreter evaluation of the filter, analysis of the filter condition
etc. are collected. 
  - Add instrumentation around the places where affinity analysis is done. 

Such profiling is needed to understand excessively long planning times when large number of
files are present. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message