calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aman Sinha <amansi...@apache.org>
Subject Re: DISTINCT ON clause
Date Mon, 05 Mar 2018 04:09:46 GMT
So far, I haven't had much success.. other databases such as Oracle, SQL
Server and DB2 don't seem to support this at least based on their published
SQL reference guides.

One way people work around this is to use ROW_NUMBER() window function to
achieve similar results.
e.g
     SELECT a, b, c FROM (SELECT a, b, c , ROW_NUMBER() OVER (PARTITION BY
a ORDER BY <some column>) as rownum FROM t) WHERE rownum = 1;

Regarding the aggregate functions allowing arbitrary values in a group, I
am not aware of databases that do that.  I am not sure how to verify this
though.

-Aman

On Thu, Feb 22, 2018 at 10:06 PM, Aman Sinha <amansinha@apache.org> wrote:

> I will try to find out more about this during the next few days....what do
> Oracle, SQL Server, DB2 support.
>
> On Wed, Feb 21, 2018 at 2:12 PM, Julian Hyde <jhyde@apache.org> wrote:
>
>> Can you do some research, and see if any other databases do anything
>> similar? Since Postgres isn’t standard, maybe we can improve upon it a bit.
>>
>> Also, do any databases have an aggregate function that takes an arbitrary
>> value in the group, or asserts that all values are the same? We would use
>> that when we translate the syntactic sugar to algebra.
>>
>> Julian
>>
>>
>> > On Feb 21, 2018, at 2:02 PM, Aman Sinha <amansinha@apache.org> wrote:
>> >
>> > It is a useful functionality, especially since the SELECT list can
>> contain
>> > mix of primitive types and array or map type columns.
>> > I do see your point about the Postgres semantics of ORDER BY vs DISTINCT
>> > ON.
>> > I don't like the hidden semantics of MySQL either.
>> > Perhaps the compromise solution is to introduce functions such as
>> > FIRST_ROW() aggregation function (similar to lead/lag window functions)
>> and
>> > use GROUP BY.
>> >
>> > -Aman
>> >
>> > On Wed, Feb 21, 2018 at 1:31 PM, Julian Hyde <jhyde@apache.org> wrote:
>> >
>> >> I can see that it would be useful. But there are a couple of things
>> about
>> >> it that are messy.
>> >>
>> >> I think the syntax is a bit clunky because it uses parentheses; a query
>> >> would be difficult to read if people would like multiple columns,
>> >> expressions, and aliases.
>> >>
>> >> I also think the semantics are messy. If you read
>> >> https://www.postgresql.org/docs/9.5/static/sql-select.html#
>> SQL-DISTINCT <
>> >> https://www.postgresql.org/docs/9.5/static/sql-select.html#
>> SQL-DISTINCT>
>> >> you will see that “DISTINCT ON” is evaluated after the ORDER BY clause,
>> >> whereas regular DISTINCT is evaluated before the ORDER BY clause. So
>> there
>> >> will be a bizarre interaction if DISTINCT ON is used with UNION and
>> ORDER
>> >> BY.
>> >>
>> >> By the way, good ol’ MySQL doesn’t have this problem; you can just
>> write
>> >>
>> >> SELECT a, b, c FROM t GROUP BY a
>> >>
>> >> and it just picks the first value of b and c. This “feature” is one
of
>> my
>> >> least favorite things about MySQL, so let’s not emulate it. :)
>> >>
>> >> Julian
>> >>
>> >>
>> >>> On Feb 21, 2018, at 12:44 PM, Aman Sinha <amansinha@apache.org>
>> wrote:
>> >>>
>> >>> The DISTINCT 'ON' clause is not supported but I am trying to see if
>> there
>> >>> are other people who have run into this.  One of the use cases I have
>> >>> intuitively maps to something like this:
>> >>>
>> >>> SELECT DISTINCT *ON (a)*,  b, c  FROM T
>> >>>
>> >>> Here  suppose 'a' is an INT  and b, c are some complex types such as
>> >>> array.  In my example, the values of  b and c happen to be the same
in
>> >> all
>> >>> rows belonging to the same group of 'a', so I just want the first row.
>> >>> Since these are arrays, I cannot use MIN(b) GROUP BY a .   Other
>> >>> alternative is to create a new aggregate function that picks the first
>> >>> value but it makes the syntax verbose especially with large number of
>> >> such
>> >>> columns.
>> >>>
>> >>> Incidentally, Postgres supports this [1]
>> >>>
>> >>> [1]
>> >>> https://www.postgresql.org/docs/current/static/sql-
>> >> select.html#SQL-DISTINCT
>> >>>
>> >>>
>> >>> -Aman
>> >>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message