A customizable policy is great since there's no standard. On Tue, Aug 11, 2015 at 5:54 AM, Jinfeng Ni wrote: > I think adding a policy in Calcite to derive SUM's return type makes sense, > as SQL standard seems not specify exactly what the return type of SUM > should be. As a result, each system could choose to use different policy in > the implementation. > > Oracle seems to use "the same data type as the numeric data type of the > argument" [1], while DB2 uses the policy "The result is a large integer if > the argument values are small integers" [2]. > > > [1] http://docs.oracle.com/database/121/SQLRF/functions196.htm#i89126 > [2] > > http://public.dhe.ibm.com/ps/products/db2/info/vr101/pdf/en_US/DB2SQLRefVol1-db2s1e1011.pdf > > On Mon, Aug 10, 2015 at 2:05 PM, Julian Hyde wrote: > > > Postgres return type is "bigint for smallint or int arguments, numeric > for > > bigint arguments, otherwise the same as the argument data type”[1] > > > > SQL Server return type is int for tinyint, smallint or int; bigint for > > bigint[2]. > > > > I can see your point that “User demands the correct sum result”. But I’d > > also be pissed with Postgres if it returned a numeric (arbitrary > precision) > > result when I am summing a bigint value. So I don’t think we’re going to > > please everyone. > > > > I think the solution is to add the policy to derive SUM’s return type to > > as a new method to RelDataTypeSystem. Then Kylin can supply its own. > > > > Julian > > > > [1] http://www.postgresql.org/docs/9.1/static/functions-aggregate.html < > > http://www.postgresql.org/docs/9.1/static/functions-aggregate.html> > > > > [2] https://msdn.microsoft.com/en-us/library/ms187810.aspx < > > https://msdn.microsoft.com/en-us/library/ms187810.aspx> > > > > > On Aug 9, 2015, at 8:09 PM, Li Yang wrote: > > > > > >> 1. If x is an integer, what is the type of sum(x)? > > > > > > This is the key question. If calcite believes sum(int) = int, then > Kylin > > > have to find solution else where. User demands the correct sum result > > > anyway. It's very hard to explain and justify the behavior to user, > > > because other SQL engines like Postgres seem simply works... > > > > > > One workaround maybe let all initial values be bigint. > > > > > > On Sat, Aug 8, 2015 at 9:02 AM, Julian Hyde wrote: > > > > > >> I would approach it a different way. There are 2 separate questions: > > >> > > >> 1. If x is an integer, what is the type of sum(x)? > > >> > > >> This concerns SQL query validation. Calcite’s answer is that if x has > > type > > >> T, then sum(x) has type T. Not perfect, but simple. If x is an int and > > you > > >> want the result to be a bigint, just write sum(cast(x as bigint)). > > >> > > >> 2. Do we detect overflow while calculating sum, and if so, how? > > >> > > >> This is an implementation question, and needs to be solved in each > > engine. > > >> Drill is one such engine, and Enumerable is another. Enumerable does > not > > >> currently detect overflow. > > >> > > >> One strategy would be to use a higher precision data type internally > > (but > > >> this strategy works only if you have an upper bound on the number of > > input > > >> rows). Another is to use a method such as java.lang.Math.addExact(int, > > int). > > >> > > >> Julian > > >> > > >> > > >> > > >>> On Aug 7, 2015, at 10:38 AM, Jinfeng Ni > wrote: > > >>> > > >>> I think it makes sense to use bigint as the result type for > > sum(integer). > > >>> > > >>> Postgres seems to work in this way. > > >>> > > >>> mydb=# \d+ emp > > >>> Table "public.emp" > > >>> Column | Type | Modifiers | Storage | Stats > target | > > >>> Description > > >>> > > >> > > > ----------+-----------------------+-----------+----------+--------------+------------- > > >>> empno | integer | | plain | > > | > > >>> ..... > > >>> > > >>> create table tmp as select sum(empno) sum_eno from emp; > > >>> > > >>> \d+ tmp; > > >>> Table "public.tmp" > > >>> Column | Type | Modifiers | Storage | Stats target | Description > > >>> ---------+--------+-----------+---------+--------------+------------- > > >>> sum_eno | bigint | | plain | | > > >>> > > >>> > > >>> As we can see, the column sum_eno in 'tmp' table after the CTAS > > statement > > >>> has bigint type. > > >>> > > >>> In Drill, we also use bigint for sum(integer). Drill has to put > > >> additional > > >>> logic, since Calcite by default will use int as the result type for > > >>> sum(int). > > >>> > > >>> > > >>> > > >>> > > >>> On Fri, Aug 7, 2015 at 6:51 AM, hongbin ma > > wrote: > > >>> > > >>>> hi, > > >>>> > > >>>> Suppose I have a table column called "price", its data type is > > integer. > > >>>> it seems that the sum aggregator in "select sum(price) from table" > > will > > >>>> return integer type, too. > > >>>> > > >>>> When I have millions of rows in the table, > > >>>> "select sum(price) from table" might overflow, is it a bug? > > >>>> Or may I how do you look into this problem? > > >>>> > > >>>> -- > > >>>> Regards, > > >>>> > > >>>> *Bin Mahone | 马洪宾* > > >>>> Apache Kylin: http://kylin.io > > >>>> Github: https://github.com/binmahone > > >>>> > > >> > > >> > > > > >