drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (Jira)" <j...@apache.org>
Subject [jira] [Created] (DRILL-7567) Metastore enhancements
Date Tue, 04 Feb 2020 03:26:00 GMT
Paul Rogers created DRILL-7567:

             Summary: Metastore enhancements
                 Key: DRILL-7567
                 URL: https://issues.apache.org/jira/browse/DRILL-7567
             Project: Apache Drill
          Issue Type: Improvement
            Reporter: Paul Rogers

The Metastore feature shipped as a Beta. Review of the documentation identified a number of
opportunities for improvement before the feature leaves Beta.

* Should the Metastore be configured in its own file? Does this push us in the direction of
each feature having its own set of config files? Or, should config move into the normal Drill
config files?
* Provide a detailed schema and description of Metadata entities, like the Hive metadata schema.
* Provide an out-of-the-box sample Metastore for some of Drills demo tables.
* Provide a Metastore tutorial. Refer to the sample Metastore in the tutorial. Many folks
learn best by trying things hands-on.
* Solve read/write consistency issues to avoid the need for the error/recovery described for
* Boot-time config is stored in the {{drill.metastore}} namespace. But, Metastore SYSTEM/SESSION
options are in the {{drill.exec}} namespace. This is confusing. Let's be consistent.
* {{drill.exec.storage.implicit.last_modified_time.column.label}} is a bug: Drill internal
names should never conflict with user-defined column names. Figure out where they conflict
the issue. No user can ever guarantee that some name will never be used in their tables. Nor
can users easily fix the issue if it occurs. (Note: this is a flaw with our implicit columns
as well.)
* Provide a form of ANALYZE TABLE that automatically reuses settings from any previous run.
It will otherwise be very user unfriendly for the user to have to find a place to store the
ANALYZE TABLE command so that they can submit exactly the same one each time. In fact, experience
with Impala suggests that end users will have no idea about schema, they just want the latest
metadata. Such users won't even know the details of a command some other user might have submitted.
* The Iceberg metastore requires atomic rename. But, the most common use case for Drill today
is the cloud. S3 does not support atomic rename. We need to fix this.
* The documentation says we us the "plugin name" as part of the table key. But, for DFS, say,
the user can have dozens of plugin configs, each with a distinct name. Each can reuse the
same workspace name of, say "foo". Thus "dfs/foo" is ambiguous. But, "hdfs1/foo", and "local/foo"
are unique if we use storage plugin config names.
* It is not clear if the Iceberg metastore supports HDFS security and Kerberos tickets. If
not, then it won't work in a production deployment.
* The metastore is meant to store schema. A key use is when schema is ambiguous. But, metastore
gathers schema the same way that Drill queries tables. If schema is ambiguous, the ANALYZE
TABLE will fail. Thus we do not actually solve the ambiguous schema problem. We need a solution.
* Better partition support. Drill has a long-standing usability issue that users must do their
own partition coding. If I want data from 2018-11 to 2019-02 (one quarter worth of data),
I have to write the very ugly

WHERE (dir0 = 2018 AND dir1 >= 11)
        OR (dir0 = 2019 AND dir1 <= 1)

With Hive/Impala/Presto I can just write:

WHERE transDate IN ('2018-11-01', '2019-01-31')
* Allow staged gathering of stats. Allow me to first gather stats and review them for quality
before I have my users start using them. As it is, there is no ability to gather them, enable
the option for a session for testing, verify that things work right, then turn it on for everyone.
That is, in a shared system, all heck can break loose in the current implementation.
* Review the internal Metastore tables. See many comments about the structure in the Metastore
documentation PR.

This message was sent by Atlassian Jira

View raw message