calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gian Merlino <g...@imply.io>
Subject Re: Embed druid-sql inside Calcite?
Date Wed, 07 Feb 2018 18:18:29 GMT
I think druid-sql could support the Hive use case without too much
reworking. It has a method that returns a Sequence:

  public abstract Sequence<Object[]> runQuery();

But it also has another method that returns the Druid query, and Hive would
probably call that one:

  public DruidQuery toDruidQuery()

Additionally, I guess Hive doesn't want to push "HAVING" and "ORDER BY"
down to Druid, so it should avoid adding those rules. There is enough
flexibility in druid-sql for that (push down of where, group by, having,
and order by all implemented as separate rules).

About reducing dependencies -- it would be tough, since druid-sql's
planning logic also uses Druid model classes (like ExtractionFn, Query,
etc) as part of its rules, and so it depends on druid-processing pretty
deeply. Hopefully that would be acceptable to current users of
calcite-druid. I think it does have a big advantage: by using Druid's own
model classes, there is no need to implement serde and query validation
twice.

> I think, the hypothetical case you mentioned is also worth considering, to
> ease up the development process, we can consider moving calcite-druid as a
> module in druid, so that we make release of both druid-sql and
> calcite-adapter together.

By this: do you mean you're considering removing calcite-druid altogether?
So, if someone wants to use Calcite with Druid, they should depend on
druid-sql (or druid-calcite or whatever) rather than calcite-druid?

Gian

On Wed, Feb 7, 2018 at 9:07 AM, Nishant Bangarwa <nishant.monu51@gmail.com>
wrote:

> Having a focused effort into a single project would be great and would
> definitely help us in evolving druid sql capabilities faster.
>
> 1) One more thing that we need to consider here is that calcite
> druid-adapter is also used in Apache Hive where we use the druid rules to
> generate an optimized plan and then the druid query is executed from druid
> containers. In druid-sql I believe the query execution logic is tied to the
> fact that execution node is a druid-broker where native queries can be run
> to generate a Sequence of results. We might need some rework there to
> ensure that things work fine with hive too after proposed changes.
>
> 2) druid-sql dependencies can probably be reduced by separating the
> planning and execution logic in druid-sql, the planning logic need not
> depend on lots of druid code and can have light-weight dependencies while
> the execution part and result serde which pulls in lots of druid
> dependencies can reside in separate module and calcite druid-adapter need
> not depend on that module.
>
> I think, the hypothetical case you mentioned is also worth considering, to
> ease up the development process, we can consider moving calcite-druid as a
> module in druid, so that we make release of both druid-sql and
> calcite-adapter together.
>
> On Wed, 7 Feb 2018 at 09:02 Gian Merlino <gian@imply.io> wrote:
>
> > Hi Calcites,
> >
> > I would like to raise the idea of adding druid-sql (
> >
> > http://search.maven.org/#artifactdetails%7Cio.druid%7Cdruid-
> sql%7C0.11.0%7Cjar
> > )
> > as a dependency in Calcite's Druid adapter. It should reduce the size of
> > calcite-druid substantially, since it would mostly just be calling into
> > druid-sql.
> >
> > This has some advantages for both projects.
> >
> > 1) Support for new Druid features often appears in Druid SQL first. By
> > embedding druid-sql, Calcite gets these new features too, without extra
> > work. For example https://issues.apache.org/jira/browse/CALCITE-2170 is
> an
> > outstanding jira to add support for Druid expressions to Calcite, but
> > druid-sql already supports these. In fact it looks like some of the code
> in
> > the proposed patch is copied from druid-sql. As another example,
> > https://issues.apache.org/jira/browse/CALCITE-2077 switched table scans
> > from "select" to "scan", which had been previously done in Druid SQL in
> > https://github.com/druid-io/druid/pull/4751.
> >
> > 2) Depending on druid-sql means Calcite doesn't need to implement its own
> > Druid query and result serde code. Druid already has it.
> >
> > 3) Focused effort on a single module rather than the split effort that we
> > have today, where some developers are contributing to druid-sql and some
> > are contributing to calcite-druid.
> >
> > 4) More test coverage for both projects, presumably.
> >
> > I think (3) and (4) especially would give us the opportunity to improve
> > both projects much more rapidly.
> >
> > However, there are also some possible disadvantages.
> >
> > 1) druid-sql is a somewhat heavyweight module. It pulls in a lot of other
> > Druid code. Calcite users may prefer a lighter weight module.
> >
> > 2) druid-sql's APIs are not intended to be stable, and probably never
> will
> > be. They may break on minor releases. So updating the version of
> druid-sql
> > in Calcite may involve tweaking how functions are called, etc. I think
> this
> > effort should be minimal if calcite-druid is mostly just delegating to
> > druid-sql.
> >
> > 3) druid-sql depends on calcite-core. This should usually be fine, but it
> > means that if calcite-core has a breaking change, then calcite-druid
> cannot
> > update its version of druid-sql until druid-sql first updates its version
> > of calcite-core.
> >
> > Despite these potential difficulties, I think the potential benefit means
> > this is worth exploring.
> >
> > Finally: a hypothetical. Why not do the other way around -- have Druid
> add
> > calcite-druid as a dependency? The main reason is that this makes the
> Druid
> > development process awkward when a new Druid SQL feature also requires a
> > new native query feature. Today, we develop the native query and SQL
> sides
> > together. If Druid depended on calcite-druid, then we would need to
> develop
> > the native query side first, then release it, then update Calcite's Druid
> > adapter, then pull that back into Druid. Generally, just adding an extra
> > rule in druid-sql wouldn't be enough, since the sorts of changes we are
> > making at this point are typically more extensive than just adjusting
> > rules.
> >
> > Gian
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message