calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: sum of integer overflow
Date Mon, 10 Aug 2015 21:05:25 GMT
Postgres return type is "bigint for smallint or int arguments, numeric for bigint arguments,
otherwise the same as the argument data type”[1]

SQL Server return type is int for tinyint, smallint or int; bigint for bigint[2].

I can see your point that “User demands the correct sum result”. But I’d also be pissed
with Postgres if it returned a numeric (arbitrary precision) result when I am summing a bigint
value. So I don’t think we’re going to please everyone.

I think the solution is to add the policy to derive SUM’s return type to as a new method
to RelDataTypeSystem. Then Kylin can supply its own.

Julian

[1] http://www.postgresql.org/docs/9.1/static/functions-aggregate.html <http://www.postgresql.org/docs/9.1/static/functions-aggregate.html>

[2] https://msdn.microsoft.com/en-us/library/ms187810.aspx <https://msdn.microsoft.com/en-us/library/ms187810.aspx>

> On Aug 9, 2015, at 8:09 PM, Li Yang <liyang@apache.org> wrote:
> 
>> 1. If x is an integer, what is the type of sum(x)?
> 
> This is the key question.  If calcite believes sum(int) = int, then Kylin
> have to find solution else where.  User demands the correct sum result
> anyway.  It's very hard to explain and justify the behavior to user,
> because other SQL engines like Postgres seem simply works...
> 
> One workaround maybe let all initial values be bigint.
> 
> On Sat, Aug 8, 2015 at 9:02 AM, Julian Hyde <jhyde@apache.org> wrote:
> 
>> I would approach it a different way. There are 2 separate questions:
>> 
>> 1. If x is an integer, what is the type of sum(x)?
>> 
>> This concerns SQL query validation. Calcite’s answer is that if x has type
>> T, then sum(x) has type T. Not perfect, but simple. If x is an int and you
>> want the result to be a bigint, just write sum(cast(x as bigint)).
>> 
>> 2. Do we detect overflow while calculating sum, and if so, how?
>> 
>> This is an implementation question, and needs to be solved in each engine.
>> Drill is one such engine, and Enumerable is another. Enumerable does not
>> currently detect overflow.
>> 
>> One strategy would be to use a  higher precision data type internally (but
>> this strategy works only if you have an upper bound on the number of input
>> rows). Another is to use a method such as java.lang.Math.addExact(int, int).
>> 
>> Julian
>> 
>> 
>> 
>>> On Aug 7, 2015, at 10:38 AM, Jinfeng Ni <jinfengni99@gmail.com> wrote:
>>> 
>>> I think it makes sense to use bigint as the result type for sum(integer).
>>> 
>>> Postgres seems to work in this way.
>>> 
>>> mydb=# \d+ emp
>>>                                 Table "public.emp"
>>> Column  |         Type          | Modifiers | Storage  | Stats target |
>>> Description
>>> 
>> ----------+-----------------------+-----------+----------+--------------+-------------
>>> empno    | integer               |           | plain    |              |
>>> .....
>>> 
>>> create table tmp as select sum(empno) sum_eno from emp;
>>> 
>>> \d+ tmp;
>>>                        Table "public.tmp"
>>> Column  |  Type  | Modifiers | Storage | Stats target | Description
>>> ---------+--------+-----------+---------+--------------+-------------
>>> sum_eno | bigint |           | plain   |              |
>>> 
>>> 
>>> As we can see, the column sum_eno in 'tmp' table after the CTAS statement
>>> has bigint type.
>>> 
>>> In Drill, we also use bigint for sum(integer).  Drill has to put
>> additional
>>> logic, since Calcite by default will use int as the result type for
>>> sum(int).
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Aug 7, 2015 at 6:51 AM, hongbin ma <mahongbin@apache.org> wrote:
>>> 
>>>> hi,
>>>> 
>>>> Suppose I have a table column called "price", its data type is integer.
>>>> it seems that the sum aggregator in "select sum(price) from table" will
>>>> return integer type, too.
>>>> 
>>>> When I have millions of rows in the table,
>>>> "select sum(price) from table" might overflow, is it a bug?
>>>> Or may I how do you look into this problem?
>>>> 
>>>> --
>>>> Regards,
>>>> 
>>>> *Bin Mahone | 马洪宾*
>>>> Apache Kylin: http://kylin.io
>>>> Github: https://github.com/binmahone
>>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message