calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: DISTINCT ON clause
Date Wed, 21 Feb 2018 22:12:13 GMT
Can you do some research, and see if any other databases do anything similar? Since Postgres
isn’t standard, maybe we can improve upon it a bit.

Also, do any databases have an aggregate function that takes an arbitrary value in the group,
or asserts that all values are the same? We would use that when we translate the syntactic
sugar to algebra.

Julian


> On Feb 21, 2018, at 2:02 PM, Aman Sinha <amansinha@apache.org> wrote:
> 
> It is a useful functionality, especially since the SELECT list can contain
> mix of primitive types and array or map type columns.
> I do see your point about the Postgres semantics of ORDER BY vs DISTINCT
> ON.
> I don't like the hidden semantics of MySQL either.
> Perhaps the compromise solution is to introduce functions such as
> FIRST_ROW() aggregation function (similar to lead/lag window functions) and
> use GROUP BY.
> 
> -Aman
> 
> On Wed, Feb 21, 2018 at 1:31 PM, Julian Hyde <jhyde@apache.org> wrote:
> 
>> I can see that it would be useful. But there are a couple of things about
>> it that are messy.
>> 
>> I think the syntax is a bit clunky because it uses parentheses; a query
>> would be difficult to read if people would like multiple columns,
>> expressions, and aliases.
>> 
>> I also think the semantics are messy. If you read
>> https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT <
>> https://www.postgresql.org/docs/9.5/static/sql-select.html#SQL-DISTINCT>
>> you will see that “DISTINCT ON” is evaluated after the ORDER BY clause,
>> whereas regular DISTINCT is evaluated before the ORDER BY clause. So there
>> will be a bizarre interaction if DISTINCT ON is used with UNION and ORDER
>> BY.
>> 
>> By the way, good ol’ MySQL doesn’t have this problem; you can just write
>> 
>> SELECT a, b, c FROM t GROUP BY a
>> 
>> and it just picks the first value of b and c. This “feature” is one of my
>> least favorite things about MySQL, so let’s not emulate it. :)
>> 
>> Julian
>> 
>> 
>>> On Feb 21, 2018, at 12:44 PM, Aman Sinha <amansinha@apache.org> wrote:
>>> 
>>> The DISTINCT 'ON' clause is not supported but I am trying to see if there
>>> are other people who have run into this.  One of the use cases I have
>>> intuitively maps to something like this:
>>> 
>>> SELECT DISTINCT *ON (a)*,  b, c  FROM T
>>> 
>>> Here  suppose 'a' is an INT  and b, c are some complex types such as
>>> array.  In my example, the values of  b and c happen to be the same in
>> all
>>> rows belonging to the same group of 'a', so I just want the first row.
>>> Since these are arrays, I cannot use MIN(b) GROUP BY a .   Other
>>> alternative is to create a new aggregate function that picks the first
>>> value but it makes the syntax verbose especially with large number of
>> such
>>> columns.
>>> 
>>> Incidentally, Postgres supports this [1]
>>> 
>>> [1]
>>> https://www.postgresql.org/docs/current/static/sql-
>> select.html#SQL-DISTINCT
>>> 
>>> 
>>> -Aman
>> 
>> 


Mime
View raw message