spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenchen Fan <cloud0...@gmail.com>
Subject Re: SQL DDL statements with replacing default catalog with custom catalog
Date Wed, 07 Oct 2020 10:48:21 GMT
If you just want to save typing the catalog name when writing table names,
you can set your custom catalog as the default catalog (See
SQLConf.DEFAULT_CATALOG). SQLConf.V2_SESSION_CATALOG_IMPLEMENTATION is used
to extend the v1 session catalog, not replace it.

On Wed, Oct 7, 2020 at 5:36 PM Jungtaek Lim <kabhwan.opensource@gmail.com>
wrote:

> If it's by design and not prepared, then IMHO replacing the default
> session catalog is better to be restricted until things are sorted out, as
> it gives pretty much confusion and has known bugs. Actually there's another
> bug/limitation on default session catalog on the length of identifier,
> so things that work with custom catalog no longer work when it replaces
> default session catalog.
>
> On Wed, Oct 7, 2020 at 6:05 PM Wenchen Fan <cloud0fan@gmail.com> wrote:
>
>> Ah, this is by design. V1 tables should still go through the v1 session
>> catalog. I think we can remove this restriction when we are confident about
>> the new v2 DDL commands that work with v2 catalog APIs.
>>
>> On Wed, Oct 7, 2020 at 5:00 PM Jungtaek Lim <kabhwan.opensource@gmail.com>
>> wrote:
>>
>>> My case is DROP TABLE and DROP TABLE supports both v1 and v2 (as it
>>> simply works when I use custom catalog without replacing the default
>>> catalog).
>>>
>>> It just fails on v2 when the "default catalog" is replaced (say I
>>> replace 'spark_catalog'), because TempViewOrV1Table is providing value even
>>> with v2 table, and then the catalyst goes with v1 exec. I guess all
>>> commands leveraging TempViewOrV1Table to determine whether the table is v1
>>> vs v2 would all suffer from this issue.
>>>
>>> On Wed, Oct 7, 2020 at 5:45 PM Wenchen Fan <cloud0fan@gmail.com> wrote:
>>>
>>>> Not all the DDL commands support v2 catalog APIs (e.g. CREATE TABLE
>>>> LIKE), so it's possible that some commands still go through the v1 session
>>>> catalog although you configured a custom v2 session catalog.
>>>>
>>>> Can you create JIRA tickets if you hit any DDL commands that don't
>>>> support v2 catalog? We should fix them.
>>>>
>>>> On Wed, Oct 7, 2020 at 9:15 AM Jungtaek Lim <
>>>> kabhwan.opensource@gmail.com> wrote:
>>>>
>>>>> The logical plan for the parsed statement is getting converted either
>>>>> for old one or v2, and for the former one it keeps using an external
>>>>> catalog (Hive) - so replacing default session catalog with custom one
and
>>>>> trying to use it like it is in external catalog doesn't work, which
>>>>> destroys the purpose of replacing the default session catalog.
>>>>>
>>>>> Btw I see one approach: in TempViewOrV1Table, if it matches
>>>>> with SessionCatalogAndIdentifier where the catalog is TableCatalog, call
>>>>> loadTable in catalog and see whether it's V1 table or not. Not sure it's
a
>>>>> viable approach though, as it requires loading a table during resolution
of
>>>>> the table identifier.
>>>>>
>>>>> On Wed, Oct 7, 2020 at 10:04 AM Ryan Blue <rblue@netflix.com> wrote:
>>>>>
>>>>>> I've hit this with `DROP TABLE` commands that should be passed to
a
>>>>>> registered v2 session catalog, but are handled by v1. I think that's
the
>>>>>> only case we hit in our downstream test suites, but we haven't been
>>>>>> exploring the use of a session catalog for fallback. We use v2 for
>>>>>> everything now, which avoids the problem and comes with multi-catalog
>>>>>> support.
>>>>>>
>>>>>> On Tue, Oct 6, 2020 at 5:55 PM Jungtaek Lim <
>>>>>> kabhwan.opensource@gmail.com> wrote:
>>>>>>
>>>>>>> Hi devs,
>>>>>>>
>>>>>>> I'm not sure whether it's addressed in Spark 3.1, but at least
from
>>>>>>> Spark 3.0.1, many SQL DDL statements don't seem to go through
the custom
>>>>>>> catalog when I replace default catalog with custom catalog and
only provide
>>>>>>> 'dbName.tableName' as table identifier.
>>>>>>>
>>>>>>> I'm not an expert in this area, but after skimming the code I
feel
>>>>>>> TempViewOrV1Table looks to be broken for the case, as it can
still be a V2
>>>>>>> table. Classifying the table identifier to either V2 table or
"temp view or
>>>>>>> v1 table" looks to be mandatory, as former and latter have different
code
>>>>>>> paths and different catalog interfaces.
>>>>>>>
>>>>>>> That sounds to me as being stuck and the only "clear" approach
seems
>>>>>>> to disallow default catalog with custom one. Am I missing something?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Software Engineer
>>>>>> Netflix
>>>>>>
>>>>>

Mime
View raw message