ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@apache.org>
Subject Re: Calcite based SQL query engine. Local queries
Date Thu, 07 Nov 2019 22:48:20 GMT
Folks,

See our compute tasks as an advanced version of stored procedures that let
the users code the logic of various complexity with Java, .NET or C++ (and
not with PL/SQL). The logic can use a combination of APIs (key-value, SQL,
etc.) to access data both locally and remotely while being executed on
server nodes. The logic can make N key-value requests or run M SQL queries.

We kept supporting local SQL queries exactly for such scenarios (for our
version of stored procedures) to ensure the distributed map-reduce phase is
canceled if all the data is local. And affinityCalls were improved one day
to pin the partitions.

If the new engine is smart enough to understand that all the partitions are
available locally during the affinityRun execution then it's totally fine
to remove the 'local' flag. Otherwise, we need to instruct the engine
manually that a distributed phase is redundant via 'local' flag or by other
means.

Does it make things clearer?


-
Denis


On Thu, Nov 7, 2019 at 3:53 AM Ivan Pavlukhin <vololo100@gmail.com> wrote:

> Stephen,
>
> In my understanding we need to do a better job to realize use-cases of
> Compute + LocalSQL ourselves.
>
> Ideally smart optimizer should do the best job of query deployment.
>
> чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington
> <stephen.darlington@gridgain.com>:
> >
> > I made a (bad) assumption that this would also affect queries against
> partitions. If “setLocal()” goes away but “setPartitions()” remains I’m
> happy.
> >
> > What I would say is that the “broadcast / local” method is one I see
> fairly often. Do we need to do a better job educating people of the
> “correct” way?
> >
> > Regards,
> > Stephen
> >
> > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <alexey.goncharuk@gmail.com>
> wrote:
> > >
> > > Denis, Stephen,
> > >
> > > Running a local query in a broadcast closure won't work on changing
> > > topology. We specifically added an affinityCall method to the compute
> API
> > > in order to pin a partition to prevent its moving and eviction
> throughout
> > > the task execution. Therefore, the query inside an affinityCall is
> always
> > > executed against some partitions (otherwise the query may give
> incorrect
> > > results when topology is changed).
> > >
> > > I support Igor's question and think that the 'local' flag for the query
> > > should be deprecated and eventually removed. A 'local' query can
> always be
> > > expressed as a query agains a set of partitions. If those partitions
> are
> > > located on the same node - good, we get fast and correct results. If
> not -
> > > we may either raise an exception and ask user to remap the query, or
> > > fallback to a distributed query execution.
> > >
> > > Given that the Calcite prototype is in its early stages, it's likely
> its
> > > first version will be available in 3.x, and it's a good chance to get
> rid
> > > of wrong API pieces.
> > >
> > > --AG
> > >
> > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> > > stephen.darlington@gridgain.com>:
> > >
> > >> A common use case is where you want to work on many rows of data
> across
> > >> the grid. You’d broadcast a closure, running the same code on every
> node
> > >> with just the local data. SQL doesn’t work in isolation — it’s often
> used
> > >> as a filter for future computations.
> > >>
> > >> Regards,
> > >> Stephen
> > >>
> > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <vololo100@gmail.com>
wrote:
> > >>>
> > >>> Denis,
> > >>>
> > >>> I am mostly concerned about gathering use cases. It would be great
to
> > >>> critically assess such cases to identify why it cannot be solved by
> > >>> using distributed SQL. Also it sounds similar to some kind of
> "hints",
> > >>> but very limited and with all hints drawbacks (impossibility to use
> > >>> full strength of CBO). We can provide better "hints" support with new
> > >>> engine as well.
> > >>>
> > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <dmagda@apache.org>:
> > >>>>
> > >>>> Ivan,
> > >>>>
> > >>>> I was involved in a couple of such use cases personally, so, that's
> not
> > >> my
> > >>>> imagination ;) Even more, as far as I remember, the primary reason
> why
> > >> we
> > >>>> improved our affinityRuns ensuring no partition is purged from
a
> node
> > >> until
> > >>>> a task is completed is because many users were running local SQL
> from
> > >>>> compute tasks and needed a guarantee that SQL will always return
a
> > >> correct
> > >>>> result set.
> > >>>>
> > >>>> -
> > >>>> Denis
> > >>>>
> > >>>>
> > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <vololo100@gmail.com
> >
> > >> wrote:
> > >>>>
> > >>>>> Denis,
> > >>>>>
> > >>>>> Would be nice to see real use-cases of affinity call + local
SQL
> > >>>>> combination. Generally, new engine will be able to infer
> collocation
> > >>>>> resulting in the same collocated execution automatically.
> > >>>>>
> > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <dmagda@apache.org>:
> > >>>>>>
> > >>>>>> Hi Igor,
> > >>>>>>
> > >>>>>> Local queries feature is broadly used together with affinity-based
> > >>>>> compute
> > >>>>>> tasks:
> > >>>>>>
> > >>>>>
> > >>
> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> > >>>>>>
> > >>>>>> The use case is as follows. The user knows that all required
data
> > >> needed
> > >>>>>> for computation is collocated, and SQL is used as an advanced
API
> for
> > >>>>> data
> > >>>>>> retrieval from the computation code. The affinity task
ensures
> that
> > >>>>>> partitions won't be discarded from the node(s) if the topology
> changes
> > >>>>>> during the task execution and, thus, it's safe to run SQL
locally
> > >>>>> skipping
> > >>>>>> distributed phases.
> > >>>>>>
> > >>>>>> The combination of affinity compute tasks with local SQL
is a
> real and
> > >>>>>> valuable use case, and this is what we need to support
with
> Calcite.
> > >> Do
> > >>>>> you
> > >>>>>> see any challenges?
> > >>>>>>
> > >>>>>> -
> > >>>>>> Denis
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > >> <kondakov87@mail.ru.invalid
> > >>>>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hi Igor!
> > >>>>>>>
> > >>>>>>> IMO we need to maintain the backward compatibility
between old
> and
> > >> new
> > >>>>>>> query engines as much as possible. And therefore we
shouldn't
> change
> > >>>>> the
> > >>>>>>> behavior of local queries.
> > >>>>>>>
> > >>>>>>> So, for local queries Calcite's planner shouldn't consider
the
> > >>>>>>> distribution trait at all.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Kind Regards
> > >>>>>>> Roman Kondakov
> > >>>>>>>
> > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > >>>>>>>> Hi Igniters,
> > >>>>>>>>
> > >>>>>>>> Working on new generation of Ignite SQL I faced
a question: «Do
> we
> > >>>>> need
> > >>>>>>> local queries at all and, if so, what semantic they
should
> have?».
> > >>>>>>>>
> > >>>>>>>> Current planing flow consists of next steps:
> > >>>>>>>>
> > >>>>>>>> 1) Parsing SQL to AST
> > >>>>>>>> 2) Validating AST (against Schema)
> > >>>>>>>> 3) Optimizing (Building execution graph)
> > >>>>>>>> 4) Splitting (into query fragments which executes
on target
> nodes)
> > >>>>>>>> 5) Mapping (query fragments to nodes/partitions)
> > >>>>>>>>
> > >>>>>>>> At last step we check that all Fragment sources
(a table or
> result)
> > >>>>> have
> > >>>>>>> the same distribution (in other words all sources have
to be
> > >>>>> co-located)
> > >>>>>>>>
> > >>>>>>>> Planner and Splitter guarantee that all caches
in a Fragment are
> > >>>>>>> co-located, an Exchange is produced otherwise. But
if we force
> local
> > >>>>>>> execution we cannot produce Exchanges, that means we
may face two
> > >>>>>>> non-co-located caches inside a single query fragment
(result of
> local
> > >>>>> query
> > >>>>>>> planning is a single query fragment). So, we cannot
pass the
> check.
> > >>>>>>>>
> > >>>>>>>> Should we throw an exception or omit the check
for local query
> > >>>>> planning
> > >>>>>>> or prohibit local queries at all?
> > >>>>>>>>
> > >>>>>>>> Your thoughts?
> > >>>>>>>>
> > >>>>>>>> Regards,
> > >>>>>>>> Igor
> > >>>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Best regards,
> > >>>>> Ivan Pavlukhin
> > >>>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best regards,
> > >>> Ivan Pavlukhin
> > >>
> > >>
> > >>
> >
> >
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message