calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Sereda <and...@sereda.cc>
Subject Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table
Date Sat, 30 Jun 2018 14:43:30 GMT
Christian / Michael,

Can you please weight-in for your preferred solution and I'll implement it.

One more question. Sometimes it is nice to be able to filter (limit)
indexes (tables) exposed by calcite. Say my cluster has 10 indexes but I
want user to query only one. Would you be opposed if I add configuration
parameter which allows to specify a (eg. regexp) filter for ES indexes ?


On Fri, Jun 29, 2018 at 11:17 PM Andrei Sereda <andrei@sereda.cc> wrote:

> That's a reasonable alternative.
>
> On Fri, Jun 29, 2018 at 7:57 PM Julian Hyde <jhyde@apache.org> wrote:
>
>> Maybe there could be a separator char as one of the adapter’s parameters.
>> People should choose a value, say ‘$’ or ‘#’, that is legal in an unquoted
>> SQL identifier but does not occur in any of their index or type names.
>>
>> If not specified, the adapter would end up in a simple mode, say looking
>> for indexes first, then looking for types, and people would need to make
>> sure indexes and types have distinct names. After the transition to
>> single-type indexes, people could stop using the parameter.
>>
>> Julian
>>
>>
>> > On Jun 29, 2018, at 4:43 PM, Andrei Sereda <andrei@sereda.cc> wrote:
>> >
>> > That's a valid point. Then user would define a different pattern like
>> > "i$index_t$type" for his cluster.
>> >
>> > I think  we should first answer wherever such scenarios should be
>> supported
>> > by calcite (given that they're already deprecated by the vendor). If
>> yes,
>> > what should be collision strategy ? User defined pattern like above or
>> > failure or auto generated name ?
>> >
>> > On Fri, Jun 29, 2018, 19:14 Julian Hyde <jhyde@apache.org> wrote:
>> >
>> >>> In elastic (index/type) pair is guaranteed to be unique therefore
>> >>> "${index}_${type}" will be also unique (as string). This is only
>> >> necessary
>> >>> when we have several types per index. Valid question is wherever user
>> >>> should be allowed such flexibility.
>> >>
>> >> Uniqueness is not my concern.
>> >>
>> >> Suppose there is an index called "x_y" with a type called "z", and
>> >> another index called "x" with a type called "y_z". If I write "x_y_z"
>> >> it's not clear how it should be broken into index/type.
>> >>
>> >>
>> >> On Fri, Jun 29, 2018 at 3:15 PM, Andrei Sereda <andrei@sereda.cc>
>> wrote:
>> >>>> Can you show how those examples affect SQL against the ES adapter
>> and/or
>> >>> how they affect JSON models?
>> >>>
>> >>> The discussion is how to properly bridge (index/type) concept from ES
>> >> into
>> >>> relational world. Proposal to use placeholders ($index / $type)
>> affects
>> >>> only how table is named in calcite. They're not used as SQL literals.
>> IE
>> >> it
>> >>> affects only configuration phase of the schema.
>> >>> Pretty much we're doing string/replace to derive table name from
>> >>> ($index/$type).
>> >>>
>> >>>> You seem to be using '_' as a separator character. Are we sure that
>> >>>> people will never use it in index or type name? Separator characters
>> >>>> often cause problems.
>> >>> In elastic (index/type) pair is guaranteed to be unique therefore
>> >>> "${index}_${type}" will be also unique (as string). This is only
>> >> necessary
>> >>> when we have several types per index. Valid question is wherever user
>> >>> should be allowed such flexibility.
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Jun 29, 2018 at 2:19 PM Julian Hyde <jhyde@apache.org>
wrote:
>> >>>
>> >>>> Andrei,
>> >>>>
>> >>>> I'm not an ES user so I don't fully understand this issue, but my
two
>> >>>> cents anyway...
>> >>>>
>> >>>> Can you show how those examples affect SQL against the ES adapter
>> >>>> and/or how they affect JSON models?
>> >>>>
>> >>>> You seem to be using '_' as a separator character. Are we sure that
>> >>>> people will never use it in index or type name? Separator characters
>> >>>> often cause problems.
>> >>>>
>> >>>> Julian
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Fri, Jun 29, 2018 at 10:58 AM, Andrei Sereda <andrei@sereda.cc>
>> >> wrote:
>> >>>>> I agree there should be a configuration option. How about the
>> >> following
>> >>>>> approach.
>> >>>>>
>> >>>>> Expose both variables ${index} and ${type} in configuration
(JSON)
>> and
>> >>>> user
>> >>>>> will use them to generate table name in calcite schema.
>> >>>>>
>> >>>>> Example
>> >>>>> "table_name": "${type}" // current
>> >>>>> "table_name": "${index}" // new (default?)
>> >>>>> "table_name": "${index}_${type}" // most generic. supports multiple
>> >> types
>> >>>>> per index
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Fri, Jun 29, 2018 at 9:26 AM Michael Mior <mmior@apache.org>
>> >> wrote:
>> >>>>>
>> >>>>>> I think it sounds like you and Andrei are in a good position
to
>> >> tackle
>> >>>> this
>> >>>>>> one so I'm happy to have you both work on whatever solution
you
>> >> think is
>> >>>>>> best.
>> >>>>>>
>> >>>>>> --
>> >>>>>> Michael Mior
>> >>>>>> mmior@apache.org
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Le ven. 29 juin 2018 à 04:19, Christian Beikov <
>> >>>> christian.beikov@gmail.com
>> >>>>>>>
>> >>>>>> a écrit :
>> >>>>>>
>> >>>>>>> IMO the best solution would be to make it configurable
by
>> >> introducing
>> >>>> a
>> >>>>>>> "table_mapping" config with values
>> >>>>>>>
>> >>>>>>>  * type - every type in the known indices is mapped
as table
>> >>>>>>>  * index - every known index is mapped as table
>> >>>>>>>
>> >>>>>>> We'd probably also need a "type_field" configuration
for defining
>> >>>> which
>> >>>>>>> field to use for the type determination as one of the
possible
>> >> future
>> >>>>>>> ways to do things is to introduce a custom field:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html#_custom_type_field_2
>> >>>>>>>
>> >>>>>>> We already detect the ES version, so we can set a smart
default
>> for
>> >>>> this
>> >>>>>>> setting. Let's make the index config param optional.
>> >>>>>>>
>> >>>>>>>  * When no index is given, we discover indexes, the
default for
>> >>>>>>>    "table_mapping" then is "index"
>> >>>>>>>  * When index is given, the we only discover types according
to
>> >> the
>> >>>>>>>    "type_field" configuration and the default for "table_mapping"
>> >> is
>> >>>>>>> "type"
>> >>>>>>>
>> >>>>>>> This would also allow to discover indexes but still
use "type" as
>> >>>>>>> "table_mapping".
>> >>>>>>>
>> >>>>>>> What do you think?
>> >>>>>>>
>> >>>>>>> Mit freundlichen Grüßen,
>> >>>>>>>
>> >>>>
>> ------------------------------------------------------------------------
>> >>>>>>> *Christian Beikov*
>> >>>>>>> Am 29.06.2018 um 02:41 schrieb Andrei Sereda:
>> >>>>>>>> Yes. There is an API to list all indexes / types
in elastic. They
>> >>>> can
>> >>>>>> be
>> >>>>>>>> automatically imported into a schema.
>> >>>>>>>>
>> >>>>>>>> What needs to be agreed upon is how to expose those
elements in
>> >>>> calcite
>> >>>>>>>> schema (naming / behaviour).
>> >>>>>>>>
>> >>>>>>>> 1) Many (most?) of setups are single type per index.
Natural way
>> >> to
>> >>>>>> name
>> >>>>>>>> would be  "elastic.$index" (elastic being schema
name). Multiple
>> >>>>>> indexes
>> >>>>>>>> would be under same schema "elastic.index1" "elastic.index2"
etc.
>> >>>>>>>>
>> >>>>>>>> 2) What if index has several types should they exported
as
>> >> calcite
>> >>>>>>> tables:
>> >>>>>>>> "elastic.$index_type1" "elastic.$index_type2" ?
 Or (current
>> >>>> behaviour)
>> >>>>>>> as
>> >>>>>>>> "elastic.type1" and "elastic.type2". Or as subschema
>> >>>>>>>> "elastic.$index.type1" ?
>> >>>>>>>>
>> >>>>>>>> Now what if one has combination of (1) and (2) ?
>> >>>>>>>> Setup (2) is already deprecated (and will be unsupported
in next
>> >>>>>> version)
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Thu, Jun 28, 2018 at 7:31 PM Christian Beikov
<
>> >>>>>>> christian.beikov@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Is there an API to discover indexes? If there
is, I'd suggest we
>> >>>>>> allow a
>> >>>>>>>>> config option that to make the adapter discover
the possible
>> >>>> indexes.
>> >>>>>>>>> We'd still have to adapt the code a bit, but
internally, the
>> >> schema
>> >>>>>>>>> could just keep a cache of type name to index
name map and be
>> >> able
>> >>>> to
>> >>>>>>>>> support both scenarios.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Mit freundlichen Grüßen,
>> >>>>>>>>>
>> >>>>>>
>> >>
>> ------------------------------------------------------------------------
>> >>>>>>>>> *Christian Beikov*
>> >>>>>>>>> Am 29.06.2018 um 00:12 schrieb Andrei Sereda:
>> >>>>>>>>>>> 1) What's the time horizon for the current
adapter no longer
>> >>>> working
>> >>>>>>>>> with these
>> >>>>>>>>>> changes to ES ?
>> >>>>>>>>>> Current adapter will be working for a while
with existing
>> >> setup.
>> >>>> The
>> >>>>>>>>>> problem is nomenclature and ease of use.
>> >>>>>>>>>>
>> >>>>>>>>>> Their new SQL concepts mapping
>> >>>>>>>>>> <
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html
>> >>>>>>>>>> drops
>> >>>>>>>>>> the notion of ES type (which before was
equivalent of RDBMS
>> >> table)
>> >>>>>> and
>> >>>>>>>>> uses
>> >>>>>>>>>> ES index as new table equivalent (before
ES index was equal to
>> >>>>>>> database).
>> >>>>>>>>>> Most users use elastic this way (one type
, one index) index ==
>> >>>>>> table.
>> >>>>>>>>>>
>> >>>>>>>>>> Currently calcite requires schema per index.
In RDBMS parlance
>> >>>>>> database
>> >>>>>>>>> per
>> >>>>>>>>>> table (I'd like to change that).
>> >>>>>>>>>>
>> >>>>>>>>>>> 2) Any guess how complicated it would
be to maintain code
>> >> paths
>> >>>> for
>> >>>>>>> both
>> >>>>>>>>>>> behaviours? I know this is probably
really challenging to
>> >>>> estimate,
>> >>>>>>> but
>> >>>>>>>>> I
>> >>>>>>>>>>> really have no idea of the scope of
these changes. Would it
>> >> mean
>> >>>> two
>> >>>>>>>>>>> different ES adapters?
>> >>>>>>>>>> One can have just a separate calcite schema
implementations
>> >> (same
>> >>>>>>>>> adapter /
>> >>>>>>>>>> module) :
>> >>>>>>>>>> 1)  LegacySchema (old). Schema can have
only one index (but
>> >>>> multiple
>> >>>>>>>>>> types). Type == table in this case.
>> >>>>>>>>>> 2)  NewSchema (new). Single schema can have
multiple indexes
>> >>>> (type is
>> >>>>>>>>>> dropped). Index == table in this case
>> >>>>>>>>>>
>> >>>>>>>>>>> 3) Do we really need compatibility with
the current version of
>> >>>> the
>> >>>>>>>>>> adapter?
>> >>>>>>>>>>> IMO this depends on what versions of
ES we would lose support
>> >> for
>> >>>>>> and
>> >>>>>>>>> how
>> >>>>>>>>>>> complex it would be for users of the
current ES adapter to
>> >> make
>> >>>>>>> updates
>> >>>>>>>>>> for
>> >>>>>>>>>>> any Calcite API changes.
>> >>>>>>>>>> The issue is not in adapter but how calcite
schema exposes
>> >> tables.
>> >>>>>>>>> Should
>> >>>>>>>>>> it expose index as individual table (new),
or ES type (old) ?
>> >>>>>>>>>>
>> >>>>>>>>>> Andrei.
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Jun 28, 2018 at 5:23 PM Michael
Mior <mmior@apache.org
>> >>>
>> >>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Unfortunately I know very little about
ES so I'm not in a
>> >> great
>> >>>>>>>>> position to
>> >>>>>>>>>>> asses the impact of these changes. I
will say that that legacy
>> >>>>>>>>>>> compatibility is great, but maintaining
two sets of logic is
>> >>>> always
>> >>>>>> a
>> >>>>>>>>>>> challenge. A few follow up questions:
>> >>>>>>>>>>>
>> >>>>>>>>>>> 1) What's the time horizon for the current
adapter no longer
>> >>>> working
>> >>>>>>>>> with
>> >>>>>>>>>>> these changes to ES?
>> >>>>>>>>>>>
>> >>>>>>>>>>> 2) Any guess how complicated it would
be to maintain code
>> >> paths
>> >>>> for
>> >>>>>>> both
>> >>>>>>>>>>> behaviours? I know this is probably
really challenging to
>> >>>> estimate,
>> >>>>>>> but
>> >>>>>>>>> I
>> >>>>>>>>>>> really have no idea of the scope of
these changes. Would it
>> >> mean
>> >>>> two
>> >>>>>>>>>>> different ES adapters?
>> >>>>>>>>>>>
>> >>>>>>>>>>> 3) Do we really need compatibility with
the current version of
>> >>>> the
>> >>>>>>>>> adapter?
>> >>>>>>>>>>> IMO this depends on what versions of
ES we would lose support
>> >> for
>> >>>>>> and
>> >>>>>>>>> how
>> >>>>>>>>>>> complex it would be for users of the
current ES adapter to
>> >> make
>> >>>>>>> updates
>> >>>>>>>>> for
>> >>>>>>>>>>> any Calcite API changes.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks for your continued work on the
ES adapter Andrei!
>> >>>>>>>>>>>
>> >>>>>>>>>>> --
>> >>>>>>>>>>> Michael Mior
>> >>>>>>>>>>> mmior@apache.org
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Le jeu. 28 juin 2018 à 12:57, Andrei
Sereda <andrei@sereda.cc
>> >
>> >> a
>> >>>>>>> écrit
>> >>>>>>>>> :
>> >>>>>>>>>>>> Hello,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Elastic announced
>> >>>>>>>>>>>> <
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html
>> >>>>>>>>>>>> that they will be deprecating mapping
types in ES6 and
>> >> indexes
>> >>>> will
>> >>>>>>> be
>> >>>>>>>>>>>> single-typed only.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Historical analogy <
>> >> https://www.elastic.co/blog/index-vs-type>
>> >>>>>>> between
>> >>>>>>>>>>>> RDBMS and elastic was that index
is equivalent to a database
>> >> and
>> >>>>>> type
>> >>>>>>>>>>>> corresponds to table in that database.
In a couple of
>> >> releases
>> >>>>>>> (ES6-8)
>> >>>>>>>>>>> this
>> >>>>>>>>>>>> shall not longer be true.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Recent SQL addition
>> >>>>>>>>>>>> <https://www.elastic.co/blog/elasticsearch-6-3-0-released>
>> >> to
>> >>>>>>> elastic
>> >>>>>>>>>>>> confirms
>> >>>>>>>>>>>> this trend
>> >>>>>>>>>>>> <
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html
>> >>>>>>>>>>>>> .
>> >>>>>>>>>>>> Index is equivalent to a table and
there are no more ES
>> >> types.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I would like to propose to include
this logic in Calcite ES
>> >>>>>> adapter.
>> >>>>>>>>> IE,
>> >>>>>>>>>>>> expose each ES single-typed index
as a separate table inside
>> >>>>>> calcite
>> >>>>>>>>>>>> schema. This is in contrast to 
current integration where
>> >> schema
>> >>>>>> can
>> >>>>>>>>> only
>> >>>>>>>>>>>> have a single index. Current approach
forces you to create
>> >>>> multiple
>> >>>>>>>>>>> schemas
>> >>>>>>>>>>>> to query single-typed indexes (on
the same ES cluster).
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Legacy compatibility can always
be controlled with
>> >> configuration
>> >>>>>>>>>>>> parameters.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Do you agree with such changes ?
If yes, would you consider a
>> >>>> PR ?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Regards,
>> >>>>>>>>>>>> Andrei.
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message