calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aman Sinha <amansi...@apache.org>
Subject Re: DISTINCT ON clause
Date Wed, 21 Feb 2018 22:02:07 GMT
It is a useful functionality, especially since the SELECT list can contain
mix of primitive types and array or map type columns.
I do see your point about the Postgres semantics of ORDER BY vs DISTINCT
ON.
I don't like the hidden semantics of MySQL either.
Perhaps the compromise solution is to introduce functions such as
FIRST_ROW() aggregation function (similar to lead/lag window functions) and
use GROUP BY.

-Aman

On Wed, Feb 21, 2018 at 1:31 PM, Julian Hyde <jhyde@apache.org> wrote:

> I can see that it would be useful. But there are a couple of things about
> it that are messy.
>
> I think the syntax is a bit clunky because it uses parentheses; a query
> would be difficult to read if people would like multiple columns,
> expressions, and aliases.
>
> I also think the semantics are messy. If you read
> https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT <
> https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT>
> you will see that “DISTINCT ON” is evaluated after the ORDER BY clause,
> whereas regular DISTINCT is evaluated before the ORDER BY clause. So there
> will be a bizarre interaction if DISTINCT ON is used with UNION and ORDER
> BY.
>
> By the way, good ol’ MySQL doesn’t have this problem; you can just write
>
>  SELECT a, b, c FROM t GROUP BY a
>
> and it just picks the first value of b and c. This “feature” is one of my
> least favorite things about MySQL, so let’s not emulate it. :)
>
> Julian
>
>
> > On Feb 21, 2018, at 12:44 PM, Aman Sinha <amansinha@apache.org> wrote:
> >
> > The DISTINCT 'ON' clause is not supported but I am trying to see if there
> > are other people who have run into this.  One of the use cases I have
> > intuitively maps to something like this:
> >
> >  SELECT DISTINCT *ON (a)*,  b, c  FROM T
> >
> > Here  suppose 'a' is an INT  and b, c are some complex types such as
> > array.  In my example, the values of  b and c happen to be the same in
> all
> > rows belonging to the same group of 'a', so I just want the first row.
> > Since these are arrays, I cannot use MIN(b) GROUP BY a .   Other
> > alternative is to create a new aggregate function that picks the first
> > value but it makes the syntax verbose especially with large number of
> such
> > columns.
> >
> > Incidentally, Postgres supports this [1]
> >
> > [1]
> > https://www.postgresql.org/docs/current/static/sql-
> select.html#SQL-DISTINCT
> >
> >
> > -Aman
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message