drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parth Chandra (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4053) Reduce metadata cache file size
Date Mon, 09 Nov 2015 18:55:11 GMT
Parth Chandra created DRILL-4053:

             Summary: Reduce metadata cache file size
                 Key: DRILL-4053
                 URL: https://issues.apache.org/jira/browse/DRILL-4053
             Project: Apache Drill
          Issue Type: Improvement
          Components: Metadata
    Affects Versions: 1.3.0
            Reporter: Parth Chandra
            Assignee: Parth Chandra
             Fix For: 1.4.0

The parquet metadata cache file has fair amount of redundant metadata that causes the size
of the cache file to bloat. Two things that we can reduce are :
1) Schema is repeated for every row group. We can keep a merged schema (similar to what was
discussed for insert into functionality) 2) The max and min value in the stats are used for
partition pruning when the values are the same. We can keep the maxValue only and that too
only if it is the same as the minValue.

This message was sent by Atlassian JIRA

View raw message