drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Girish <agir...@apache.org>
Subject Re: [DISCUSS] case insensitive storage plugin and workspaces names
Date Wed, 13 Jun 2018 18:45:11 GMT
The issue is that for those customers who do have such storage plugin
names, it's too late to rename after an offline upgrade - as there is no
easy way to access the storage plugin configurations if Drillbits are down
(due to Drillbit start-up failing). Might be okay, if admins perform a
rolling upgrade (newer Drillbits would fail, but older Drillbits can be
used to update storage plugin config), but that's not fully supported.
Ideally, we'll need to find a way to not fail startup, instead disable the
plugins which have issues, but if that's a complex and separate task, for
now we should perhaps clearly document that this would be a breaking change
after upgrade, so users should fix the plugins before they proceed.

On Wed, Jun 13, 2018 at 3:42 AM Arina Yelchiyeva <arina.yelchiyeva@gmail.com>
wrote:

> From the Drill code workspaces are already case insensitive (though the
> documentation states the opposite). Since there were no complaints from the
> users so far, I believe there are not many (if any) who uses the same names
> in different case.
> Regarding those users that already have duplicating storage plugins names,
> after the change Drill start up will fail with appropriate error message
> and they would have to rename those storage plugins.
>
> Kind regards,
> Arina
>
>
> On Tue, Jun 12, 2018 at 8:45 PM Abhishek Girish <agirish@apache.org>
> wrote:
>
> > Paul, I think this proposal was specific to storage plugin and workspace
> > *names*. And not for the whole of Drill.
> >
> > I agree it makes sense to have these names case insensitive, to improve
> > user experience. The only impact to current users I can think of is if
> > someone created two storage plugins dfs and DFS. Or configured workspaces
> > tmp and TMP. In this case, they'd need to rename those. One thing I'm not
> > clear on is how we'll handle upgrades in these cases.
> >
> > On Tue, Jun 12, 2018 at 10:31 AM Paul Rogers <par0328@yahoo.com.invalid>
> > wrote:
> >
> > > Hi All,
> > >
> > > As it turns out, this topic has been discussed, in depth, previously.
> > > Can't recall if it was on this list, or in a JIRA.
> > >
> > > We face a number of constraints:
> > >
> > > * As was noted, for some data sources, the data source itself has case
> > > insensitive names. (Windows file systems, RDBMSs, etc.)
> > > * In other cases, the data source itself has case sensitive names.
> (HDFS
> > > file system, Linux file systems, JSON, etc.)
> > > * SQL is defined to be case insensitive.
> > > * We now have several years of user queries, in production, based on
> the
> > > current semantics.
> > >
> > > Given all this, it is very likely that simply shifting to
> case-sensitive
> > > will break existing applications.
> > >
> > > Perhaps a more subtle solution is to make the case-sensitivity a
> property
> > > of the symbol that is carried through the query pipeline as another
> piece
> > > of metadata.
> > >
> > > Thus, a workspace that corresponds to a DB schema would be labeled as
> > case
> > > insensitive. A workspace that corresponds to an HDFS directory would be
> > > case sensitive. Names defined within Drill (as part of an AS clause),
> > would
> > > follow SQL rules and be case insensitive.
> > >
> > > I believe that, if we sit down and work out exactly what users would
> > > expect, and what is required to handle both case sensitive and case
> > > insensitive names, we'll end up with a solution not far from the above
> --
> > > out of simple necessity.
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >     On Tuesday, June 12, 2018, 8:36:01 AM PDT, Arina Yelchiyeva <
> > > arina.yelchiyeva@gmail.com> wrote:
> > >
> > >  To make it clear we have three notions here: storage plugin name,
> > > workspace
> > > (schema) and table name (dfs.root.`/tmp/t`).
> > > My suggestion is the following:
> > > Storage plugin names to be case insensitive (DFS vs dfs,
> > INFORMATION_SCHEMA
> > > vs information_schema).
> > > Workspace  (schemas) names to be case insensitive (ROOT vs root, TMP vs
> > > tmp). Even if user has two directories /TMP and /tmp, he can create two
> > > workspaces but not both with tmp name. For example, tmp vs tmp_u.
> > > Table names case sensitivity are treated per plugin. For example,
> system
> > > plugins (information_schema, sys) table names (views, tables) should be
> > > case insensitive. Actually, currently for sys plugin table names are
> case
> > > insensitive, information_schema table names are case sensitive. That
> > needs
> > > to be synchronized. For file system plugins table names must be case
> > > sensitive, since under table name we imply directory / file name and
> > their
> > > case sensitivity depends on file system.
> > >
> > > Kind regards,
> > > Arina
> > >
> > > On Tue, Jun 12, 2018 at 6:13 PM Aman Sinha <amansinha@gmail.com>
> wrote:
> > >
> > > > Drill is dependent on the underlying file system's case sensitivity.
> > On
> > > > HDFS one can create  'hadoop fs -mkdir /tmp/TPCH'  and /tmp/tpch
> which
> > > are
> > > > separate directories.
> > > > These could be set as workspace in Drill's storage plugin
> configuration
> > > and
> > > > we would want the ability to query both.  If we change the current
> > > > behavior, we would want
> > > > some way, either using back-quotes `  or other way to support that.
> > > >
> > > > RDBMSs seem to have vendor-specific behavior...
> > > > In MySQL [1] the database name and schema name are case-sensitive on
> > > Linux
> > > > and case-insensitive on Windows.  Whereas in Postgres it converts the
> > > > database name and schema name to lower-case by default but one can
> put
> > > > double-quotes to make it case-sensitive [2].
> > > >
> > > > [1]
> > > >
> > https://dev.mysql.com/doc/refman/8.0/en/identifier-case-sensitivity.html
> > > > [2]
> > > >
> > >
> >
> http://www.postgresqlforbeginners.com/2010/11/gotcha-case-sensitivity.html
> > > >
> > > >
> > > >
> > > > On Tue, Jun 12, 2018 at 5:01 AM, Arina Yelchiyeva <
> > > > arina.yelchiyeva@gmail.com> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Currently Drill we treat storage plugin names and workspaces as
> > > > > case-sensitive [1].
> > > > > Names for storage plugins and workspaces are defined by the user.
> So
> > we
> > > > > allow to create plugin -> DFS and dfs, workspace -> tmp and
TMP.
> > > > > I have a suggestion to move to case insensitive approach and won't
> > > allow
> > > > > creating two plugins / workspaces with the same name in different
> > case
> > > at
> > > > > least for the following reasons:
> > > > > 1. usually rdbms schema and table names are case insensitive and
> many
> > > > users
> > > > > are used to this approach;
> > > > > 2. in Drill we have INFORMATION_SCHEMA schema which is in upper
> case,
> > > sys
> > > > > in lower case.
> > > > > personally I find it's extremely inconvenient.
> > > > >
> > > > > Also we should consider making table names case insensitive for
> > system
> > > > > schemas (info, sys).
> > > > >
> > > > > Any thoughts?
> > > > >
> > > > > [1] https://drill.apache.org/docs/lexical-structure/
> > > > >
> > > > >
> > > > > Kind regards,
> > > > > Arina
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message