calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <>
Subject Re: DISTINCT ON clause
Date Wed, 07 Mar 2018 23:16:51 GMT
I always wished there was a variant of LIMIT that operated per key. E.g. return the youngest
2 employees in each department:

ORDER BY deptno LIMIT 2, birthdate DESC

(This is not standard syntax.)

DISTINCT ON is the special case LIMIT 1. But LIMIT 1 would not mess with SQL evaluation order
the way DISTINCT ON does.

Also, per-key limit is something that a sort-limit physical operator could usefully and efficiently

> On Mar 4, 2018, at 8:09 PM, Aman Sinha <> wrote:
> So far, I haven't had much success.. other databases such as Oracle, SQL
> Server and DB2 don't seem to support this at least based on their published
> SQL reference guides.
> One way people work around this is to use ROW_NUMBER() window function to
> achieve similar results.
> e.g
> a ORDER BY <some column>) as rownum FROM t) WHERE rownum = 1;
> Regarding the aggregate functions allowing arbitrary values in a group, I
> am not aware of databases that do that.  I am not sure how to verify this
> though.
> -Aman
> On Thu, Feb 22, 2018 at 10:06 PM, Aman Sinha <> wrote:
>> I will try to find out more about this during the next few days....what do
>> Oracle, SQL Server, DB2 support.
>> On Wed, Feb 21, 2018 at 2:12 PM, Julian Hyde <> wrote:
>>> Can you do some research, and see if any other databases do anything
>>> similar? Since Postgres isn’t standard, maybe we can improve upon it a bit.
>>> Also, do any databases have an aggregate function that takes an arbitrary
>>> value in the group, or asserts that all values are the same? We would use
>>> that when we translate the syntactic sugar to algebra.
>>> Julian
>>>> On Feb 21, 2018, at 2:02 PM, Aman Sinha <> wrote:
>>>> It is a useful functionality, especially since the SELECT list can
>>> contain
>>>> mix of primitive types and array or map type columns.
>>>> I do see your point about the Postgres semantics of ORDER BY vs DISTINCT
>>>> ON.
>>>> I don't like the hidden semantics of MySQL either.
>>>> Perhaps the compromise solution is to introduce functions such as
>>>> FIRST_ROW() aggregation function (similar to lead/lag window functions)
>>> and
>>>> use GROUP BY.
>>>> -Aman
>>>> On Wed, Feb 21, 2018 at 1:31 PM, Julian Hyde <> wrote:
>>>>> I can see that it would be useful. But there are a couple of things
>>> about
>>>>> it that are messy.
>>>>> I think the syntax is a bit clunky because it uses parentheses; a query
>>>>> would be difficult to read if people would like multiple columns,
>>>>> expressions, and aliases.
>>>>> I also think the semantics are messy. If you read
>>>>> you will see that “DISTINCT ON” is evaluated after the ORDER BY clause,
>>>>> whereas regular DISTINCT is evaluated before the ORDER BY clause. So
>>> there
>>>>> will be a bizarre interaction if DISTINCT ON is used with UNION and
>>>>> BY.
>>>>> By the way, good ol’ MySQL doesn’t have this problem; you can just
>>> write
>>>>> SELECT a, b, c FROM t GROUP BY a
>>>>> and it just picks the first value of b and c. This “feature” is one
>>> my
>>>>> least favorite things about MySQL, so let’s not emulate it. :)
>>>>> Julian
>>>>>> On Feb 21, 2018, at 12:44 PM, Aman Sinha <>
>>> wrote:
>>>>>> The DISTINCT 'ON' clause is not supported but I am trying to see
>>> there
>>>>>> are other people who have run into this.  One of the use cases I
>>>>>> intuitively maps to something like this:
>>>>>> SELECT DISTINCT *ON (a)*,  b, c  FROM T
>>>>>> Here  suppose 'a' is an INT  and b, c are some complex types such
>>>>>> array.  In my example, the values of  b and c happen to be the same
>>>>> all
>>>>>> rows belonging to the same group of 'a', so I just want the first
>>>>>> Since these are arrays, I cannot use MIN(b) GROUP BY a .   Other
>>>>>> alternative is to create a new aggregate function that picks the
>>>>>> value but it makes the syntax verbose especially with large number
>>>>> such
>>>>>> columns.
>>>>>> Incidentally, Postgres supports this [1]
>>>>>> [1]
>>>>> select.html#SQL-DISTINCT
>>>>>> -Aman

View raw message