I think adding a policy in Calcite to derive SUM's return type makes sense,
as SQL standard seems not specify exactly what the return type of SUM
should be. As a result, each system could choose to use different policy in
the implementation.
Oracle seems to use "the same data type as the numeric data type of the
argument" [1], while DB2 uses the policy "The result is a large integer if
the argument values are small integers" [2].
[1] http://docs.oracle.com/database/121/SQLRF/functions196.htm#i89126
[2]
http://public.dhe.ibm.com/ps/products/db2/info/vr101/pdf/en_US/DB2SQLRefVol1db2s1e1011.pdf
On Mon, Aug 10, 2015 at 2:05 PM, Julian Hyde <jhyde@apache.org> wrote:
> Postgres return type is "bigint for smallint or int arguments, numeric for
> bigint arguments, otherwise the same as the argument data type”[1]
>
> SQL Server return type is int for tinyint, smallint or int; bigint for
> bigint[2].
>
> I can see your point that “User demands the correct sum result”. But I’d
> also be pissed with Postgres if it returned a numeric (arbitrary precision)
> result when I am summing a bigint value. So I don’t think we’re going to
> please everyone.
>
> I think the solution is to add the policy to derive SUM’s return type to
> as a new method to RelDataTypeSystem. Then Kylin can supply its own.
>
> Julian
>
> [1] http://www.postgresql.org/docs/9.1/static/functionsaggregate.html <
> http://www.postgresql.org/docs/9.1/static/functionsaggregate.html>
>
> [2] https://msdn.microsoft.com/enus/library/ms187810.aspx <
> https://msdn.microsoft.com/enus/library/ms187810.aspx>
>
> > On Aug 9, 2015, at 8:09 PM, Li Yang <liyang@apache.org> wrote:
> >
> >> 1. If x is an integer, what is the type of sum(x)?
> >
> > This is the key question. If calcite believes sum(int) = int, then Kylin
> > have to find solution else where. User demands the correct sum result
> > anyway. It's very hard to explain and justify the behavior to user,
> > because other SQL engines like Postgres seem simply works...
> >
> > One workaround maybe let all initial values be bigint.
> >
> > On Sat, Aug 8, 2015 at 9:02 AM, Julian Hyde <jhyde@apache.org> wrote:
> >
> >> I would approach it a different way. There are 2 separate questions:
> >>
> >> 1. If x is an integer, what is the type of sum(x)?
> >>
> >> This concerns SQL query validation. Calcite’s answer is that if x has
> type
> >> T, then sum(x) has type T. Not perfect, but simple. If x is an int and
> you
> >> want the result to be a bigint, just write sum(cast(x as bigint)).
> >>
> >> 2. Do we detect overflow while calculating sum, and if so, how?
> >>
> >> This is an implementation question, and needs to be solved in each
> engine.
> >> Drill is one such engine, and Enumerable is another. Enumerable does not
> >> currently detect overflow.
> >>
> >> One strategy would be to use a higher precision data type internally
> (but
> >> this strategy works only if you have an upper bound on the number of
> input
> >> rows). Another is to use a method such as java.lang.Math.addExact(int,
> int).
> >>
> >> Julian
> >>
> >>
> >>
> >>> On Aug 7, 2015, at 10:38 AM, Jinfeng Ni <jinfengni99@gmail.com> wrote:
> >>>
> >>> I think it makes sense to use bigint as the result type for
> sum(integer).
> >>>
> >>> Postgres seems to work in this way.
> >>>
> >>> mydb=# \d+ emp
> >>> Table "public.emp"
> >>> Column  Type  Modifiers  Storage  Stats target 
> >>> Description
> >>>
> >>
> +++++
> >>> empno  integer   plain 
> 
> >>> .....
> >>>
> >>> create table tmp as select sum(empno) sum_eno from emp;
> >>>
> >>> \d+ tmp;
> >>> Table "public.tmp"
> >>> Column  Type  Modifiers  Storage  Stats target  Description
> >>> +++++
> >>> sum_eno  bigint   plain  
> >>>
> >>>
> >>> As we can see, the column sum_eno in 'tmp' table after the CTAS
> statement
> >>> has bigint type.
> >>>
> >>> In Drill, we also use bigint for sum(integer). Drill has to put
> >> additional
> >>> logic, since Calcite by default will use int as the result type for
> >>> sum(int).
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Aug 7, 2015 at 6:51 AM, hongbin ma <mahongbin@apache.org>
> wrote:
> >>>
> >>>> hi,
> >>>>
> >>>> Suppose I have a table column called "price", its data type is
> integer.
> >>>> it seems that the sum aggregator in "select sum(price) from table"
> will
> >>>> return integer type, too.
> >>>>
> >>>> When I have millions of rows in the table,
> >>>> "select sum(price) from table" might overflow, is it a bug?
> >>>> Or may I how do you look into this problem?
> >>>>
> >>>> 
> >>>> Regards,
> >>>>
> >>>> *Bin Mahone  马洪宾*
> >>>> Apache Kylin: http://kylin.io
> >>>> Github: https://github.com/binmahone
> >>>>
> >>
> >>
>
>
