calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aman Sinha <amansi...@apache.org>
Subject Re: DISTINCT ON clause
Date Fri, 23 Feb 2018 06:06:37 GMT
I will try to find out more about this during the next few days....what do
Oracle, SQL Server, DB2 support.

On Wed, Feb 21, 2018 at 2:12 PM, Julian Hyde <jhyde@apache.org> wrote:

> Can you do some research, and see if any other databases do anything
> similar? Since Postgres isn’t standard, maybe we can improve upon it a bit.
>
> Also, do any databases have an aggregate function that takes an arbitrary
> value in the group, or asserts that all values are the same? We would use
> that when we translate the syntactic sugar to algebra.
>
> Julian
>
>
> > On Feb 21, 2018, at 2:02 PM, Aman Sinha <amansinha@apache.org> wrote:
> >
> > It is a useful functionality, especially since the SELECT list can
> contain
> > mix of primitive types and array or map type columns.
> > I do see your point about the Postgres semantics of ORDER BY vs DISTINCT
> > ON.
> > I don't like the hidden semantics of MySQL either.
> > Perhaps the compromise solution is to introduce functions such as
> > FIRST_ROW() aggregation function (similar to lead/lag window functions)
> and
> > use GROUP BY.
> >
> > -Aman
> >
> > On Wed, Feb 21, 2018 at 1:31 PM, Julian Hyde <jhyde@apache.org> wrote:
> >
> >> I can see that it would be useful. But there are a couple of things
> about
> >> it that are messy.
> >>
> >> I think the syntax is a bit clunky because it uses parentheses; a query
> >> would be difficult to read if people would like multiple columns,
> >> expressions, and aliases.
> >>
> >> I also think the semantics are messy. If you read
> >> https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT
> <
> >> https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT
> >
> >> you will see that “DISTINCT ON” is evaluated after the ORDER BY clause,
> >> whereas regular DISTINCT is evaluated before the ORDER BY clause. So
> there
> >> will be a bizarre interaction if DISTINCT ON is used with UNION and
> ORDER
> >> BY.
> >>
> >> By the way, good ol’ MySQL doesn’t have this problem; you can just write
> >>
> >> SELECT a, b, c FROM t GROUP BY a
> >>
> >> and it just picks the first value of b and c. This “feature” is one of
> my
> >> least favorite things about MySQL, so let’s not emulate it. :)
> >>
> >> Julian
> >>
> >>
> >>> On Feb 21, 2018, at 12:44 PM, Aman Sinha <amansinha@apache.org> wrote:
> >>>
> >>> The DISTINCT 'ON' clause is not supported but I am trying to see if
> there
> >>> are other people who have run into this.  One of the use cases I have
> >>> intuitively maps to something like this:
> >>>
> >>> SELECT DISTINCT *ON (a)*,  b, c  FROM T
> >>>
> >>> Here  suppose 'a' is an INT  and b, c are some complex types such as
> >>> array.  In my example, the values of  b and c happen to be the same in
> >> all
> >>> rows belonging to the same group of 'a', so I just want the first row.
> >>> Since these are arrays, I cannot use MIN(b) GROUP BY a .   Other
> >>> alternative is to create a new aggregate function that picks the first
> >>> value but it makes the syntax verbose especially with large number of
> >> such
> >>> columns.
> >>>
> >>> Incidentally, Postgres supports this [1]
> >>>
> >>> [1]
> >>> https://www.postgresql.org/docs/current/static/sql-
> >> select.html#SQL-DISTINCT
> >>>
> >>>
> >>> -Aman
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message