drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] paul-rogers commented on a change in pull request #1953: Add docs for Drill Metastore
Date Tue, 04 Feb 2020 03:12:28 GMT
paul-rogers commented on a change in pull request #1953: Add docs for Drill Metastore
URL: https://github.com/apache/drill/pull/1953#discussion_r374439368

 File path: _docs/performance-tuning/drill-metastore/010-using-drill-metastore.md
 @@ -0,0 +1,408 @@
+title: "Using Drill Metastore"
+parent: "Drill Metastore"
+date: 2020-01-31
+Drill 1.17 introduces the Drill Metastore which stores the table schema and table statistics.
Statistics allow Drill to better create optimal query plans.
+The Metastore is a Beta feature; it is subject to change. We encourage you to try it and
provide feedback.
+Because the Metastore is in Beta, the SQL commands and Metastore formats may change in the
next release.
+{% include startnote.html %}In Drill 1.17, this feature is supported for Parquet tables only
and is disabled by default.{% include endnote.html %}
+## Enabling Drill Metastore
+To use the Drill Metastore, you must enable it at the session or system level with one of
the following commands:
+	SET `metastore.enabled` = true;
+	ALTER SYSTEM SET `metastore.enabled` = true;
+Alternatively, you can enable the option in the Drill Web UI at `http://<drill-hostname-or-ip-address>:8047/options`.
+## Computing and storing table metadata to Drill Metastore
+Once you enable the Metastore, the next step is to populate it with data. Drill can query
a table whether that table
+ has a Metastore entry or not. (If you are familiar with Hive, then you know that Hive requires
that all tables have
+ Hive Metastore entries before you can query them.) In Drill, only add data to the Metastore
when doing so improves
+ query performance. In general, large tables benefit from statistics more than small tables
+Unlike Hive, Drill does not require you to declare a schema. Instead, Drill infers the schema
by scanning your table 
+ in the same way as it is done during regular select and computes some metadata like `MIN`
/ `MAX` column values and
+ `NULLS_COUNT` designated as "metadata" to be able to produce more optimizations like filter
push-down, etc. If
+ `planner.statistics.use` option is enabled, this command will also calculate and store table
statistics into Drill
+ Metastore.
+## Configuration
+Default Metastore configuration is defined in `drill-metastore-default.conf` file.
+It can be overridden in `drill-metastore-override.conf`. Distribution configuration can be
+indicated in `drill-metastore-distrib.conf`.
+All configuration properties should reside in `drill.metastore` namespace.
+Metastore implementation based on class implementation config property `drill.metastore.implementation.class`.
+The default value is the following:
+drill.metastore: {
+  implementation.class: "org.apache.drill.metastore.iceberg.IcebergMetastore"
+Note, that currently out of box Iceberg Metastore is available and is the default one. Though
any custom
+ implementation can be added by placing the JAR into classpath which has the implementation
+ `org.apache.drill.metastore.Metastore` interface and indicating custom class in the `drill.metastore.implementation.class`.
+### Metastore Components
+Metastore can store metadata for various components: tables, views, etc.
+Current implementation provides fully functioning support for tables component.
+Views component support is not implemented but contains stub methods to show
+how new Metastore components like UDFs, storage plugins, etc. can be added in the future.
+### Metastore Tables
+Metastore Tables component contains metadata about Drill tables, including general information,
as well as
+information about table segments, files, row groups, partitions.
+Full table metadata consists of two major concepts: general information and top-level segments
+Table general information contains basic table information and corresponds to the `BaseTableMetadata`
+A table can be non-partitioned and partitioned. Non-partitioned tables have only one top-level
+which is called default (`MetadataInfo#DEFAULT_SEGMENT_KEY`). Partitioned tables may have
several top-level segments.
+Each top-level segment can include metadata about inner segments, files, row groups, and
+A unique table identifier in Metastore Tables is a combination of storage plugin, workspace,
and table name.
+Table metadata inside is grouped by top-level segments, unique identifier of the top-level
segment and its metadata
+is storage plugin, workspace, table name, and metadata key.
+### Related Session/System Options
 Review comment:
   The metastore provides a number of options to fit your environment. The default options
are find in most cases. (True?) The options are set via ...
   In general, you should set the options via ALTER SYSTEM so that they take effect for all
users. Setting options at the session level is an advanced topic."

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message