drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] vvysotskyi commented on a change in pull request #1986: Additional changes for Drill Metastore docs
Date Tue, 03 Mar 2020 15:29:38 GMT
vvysotskyi commented on a change in pull request #1986: Additional changes for Drill Metastore
URL: https://github.com/apache/drill/pull/1986#discussion_r386535350

 File path: _docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 @@ -10,6 +10,31 @@ The Metastore is a Beta feature; it is subject to change. We encourage
you to tr
 Because the Metastore is in Beta, the SQL commands and Metastore formats may change in the
next release.
 {% include startnote.html %}In Drill 1.17, this feature is supported for Parquet tables only
and is disabled by default.{% include endnote.html %}
+## Drill Metastore introduction
+One of the main advantages of Drill is schema-on-read. But Drill can’t handle some cases
with this approach, there are the issues related to Schema Evolution and Schema Changes.
+Significant benefits of schema-aware execution:
+ - At Planning time:
+    - Better scope for planning optimizations.
+    - Proper estimation of column widths since types are known, hence more accurate costing.
+    - Graceful early exit if certain data type validations fail.
+ - At Runtime:
+    - Avoids some cases with `SchemaChange` exceptions. All minor fragments will have a common
understanding of the schema.
+Reading the data along with its statistics metadata helps to build more efficient plans and
optimize query execution:
+ - Crucial for optimal join planning, 2-phase aggregation vs 1-phase aggregation planning,
selectivity estimation of filter conditions, parallelization decisions.
+Taking into account the above points, existing query processing can be improved by:
+ - storing table schema and reusing it;
+ - collecting, storing and reusing table statistics to improve query planning.
+One of the main steps to resolve all these goals is providing the framework for Metadata
management named hereafter
 Review comment:
   Thanks, done.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message