drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arina Yelchiyeva <arina.yelchiy...@gmail.com>
Subject Re: [DISCUSS] case insensitive storage plugin and workspaces names
Date Wed, 13 Jun 2018 10:41:46 GMT
>From the Drill code workspaces are already case insensitive (though the
documentation states the opposite). Since there were no complaints from the
users so far, I believe there are not many (if any) who uses the same names
in different case.
Regarding those users that already have duplicating storage plugins names,
after the change Drill start up will fail with appropriate error message
and they would have to rename those storage plugins.

Kind regards,
Arina


On Tue, Jun 12, 2018 at 8:45 PM Abhishek Girish <agirish@apache.org> wrote:

> Paul, I think this proposal was specific to storage plugin and workspace
> *names*. And not for the whole of Drill.
>
> I agree it makes sense to have these names case insensitive, to improve
> user experience. The only impact to current users I can think of is if
> someone created two storage plugins dfs and DFS. Or configured workspaces
> tmp and TMP. In this case, they'd need to rename those. One thing I'm not
> clear on is how we'll handle upgrades in these cases.
>
> On Tue, Jun 12, 2018 at 10:31 AM Paul Rogers <par0328@yahoo.com.invalid>
> wrote:
>
> > Hi All,
> >
> > As it turns out, this topic has been discussed, in depth, previously.
> > Can't recall if it was on this list, or in a JIRA.
> >
> > We face a number of constraints:
> >
> > * As was noted, for some data sources, the data source itself has case
> > insensitive names. (Windows file systems, RDBMSs, etc.)
> > * In other cases, the data source itself has case sensitive names. (HDFS
> > file system, Linux file systems, JSON, etc.)
> > * SQL is defined to be case insensitive.
> > * We now have several years of user queries, in production, based on the
> > current semantics.
> >
> > Given all this, it is very likely that simply shifting to case-sensitive
> > will break existing applications.
> >
> > Perhaps a more subtle solution is to make the case-sensitivity a property
> > of the symbol that is carried through the query pipeline as another piece
> > of metadata.
> >
> > Thus, a workspace that corresponds to a DB schema would be labeled as
> case
> > insensitive. A workspace that corresponds to an HDFS directory would be
> > case sensitive. Names defined within Drill (as part of an AS clause),
> would
> > follow SQL rules and be case insensitive.
> >
> > I believe that, if we sit down and work out exactly what users would
> > expect, and what is required to handle both case sensitive and case
> > insensitive names, we'll end up with a solution not far from the above --
> > out of simple necessity.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >     On Tuesday, June 12, 2018, 8:36:01 AM PDT, Arina Yelchiyeva <
> > arina.yelchiyeva@gmail.com> wrote:
> >
> >  To make it clear we have three notions here: storage plugin name,
> > workspace
> > (schema) and table name (dfs.root.`/tmp/t`).
> > My suggestion is the following:
> > Storage plugin names to be case insensitive (DFS vs dfs,
> INFORMATION_SCHEMA
> > vs information_schema).
> > Workspace  (schemas) names to be case insensitive (ROOT vs root, TMP vs
> > tmp). Even if user has two directories /TMP and /tmp, he can create two
> > workspaces but not both with tmp name. For example, tmp vs tmp_u.
> > Table names case sensitivity are treated per plugin. For example, system
> > plugins (information_schema, sys) table names (views, tables) should be
> > case insensitive. Actually, currently for sys plugin table names are case
> > insensitive, information_schema table names are case sensitive. That
> needs
> > to be synchronized. For file system plugins table names must be case
> > sensitive, since under table name we imply directory / file name and
> their
> > case sensitivity depends on file system.
> >
> > Kind regards,
> > Arina
> >
> > On Tue, Jun 12, 2018 at 6:13 PM Aman Sinha <amansinha@gmail.com> wrote:
> >
> > > Drill is dependent on the underlying file system's case sensitivity.
> On
> > > HDFS one can create  'hadoop fs -mkdir /tmp/TPCH'  and /tmp/tpch which
> > are
> > > separate directories.
> > > These could be set as workspace in Drill's storage plugin configuration
> > and
> > > we would want the ability to query both.  If we change the current
> > > behavior, we would want
> > > some way, either using back-quotes `  or other way to support that.
> > >
> > > RDBMSs seem to have vendor-specific behavior...
> > > In MySQL [1] the database name and schema name are case-sensitive on
> > Linux
> > > and case-insensitive on Windows.  Whereas in Postgres it converts the
> > > database name and schema name to lower-case by default but one can put
> > > double-quotes to make it case-sensitive [2].
> > >
> > > [1]
> > >
> https://dev.mysql.com/doc/refman/8.0/en/identifier-case-sensitivity.html
> > > [2]
> > >
> >
> http://www.postgresqlforbeginners.com/2010/11/gotcha-case-sensitivity.html
> > >
> > >
> > >
> > > On Tue, Jun 12, 2018 at 5:01 AM, Arina Yelchiyeva <
> > > arina.yelchiyeva@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Currently Drill we treat storage plugin names and workspaces as
> > > > case-sensitive [1].
> > > > Names for storage plugins and workspaces are defined by the user. So
> we
> > > > allow to create plugin -> DFS and dfs, workspace -> tmp and TMP.
> > > > I have a suggestion to move to case insensitive approach and won't
> > allow
> > > > creating two plugins / workspaces with the same name in different
> case
> > at
> > > > least for the following reasons:
> > > > 1. usually rdbms schema and table names are case insensitive and many
> > > users
> > > > are used to this approach;
> > > > 2. in Drill we have INFORMATION_SCHEMA schema which is in upper case,
> > sys
> > > > in lower case.
> > > > personally I find it's extremely inconvenient.
> > > >
> > > > Also we should consider making table names case insensitive for
> system
> > > > schemas (info, sys).
> > > >
> > > > Any thoughts?
> > > >
> > > > [1] https://drill.apache.org/docs/lexical-structure/
> > > >
> > > >
> > > > Kind regards,
> > > > Arina
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message