calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Yang <liy...@apache.org>
Subject Re: sum of integer overflow
Date Tue, 11 Aug 2015 09:17:46 GMT
A customizable policy is great since there's no standard.

On Tue, Aug 11, 2015 at 5:54 AM, Jinfeng Ni <jinfengni99@gmail.com> wrote:

> I think adding a policy in Calcite to derive SUM's return type makes sense,
> as SQL standard seems not specify exactly what the return type of SUM
> should be. As a result, each system could choose to use different policy in
> the implementation.
>
> Oracle seems to use "the same data type as the numeric data type of the
> argument" [1], while DB2 uses the policy "The result is a large integer if
> the argument values are small integers" [2].
>
>
> [1] http://docs.oracle.com/database/121/SQLRF/functions196.htm#i89126
> [2]
>
> http://public.dhe.ibm.com/ps/products/db2/info/vr101/pdf/en_US/DB2SQLRefVol1-db2s1e1011.pdf
>
> On Mon, Aug 10, 2015 at 2:05 PM, Julian Hyde <jhyde@apache.org> wrote:
>
> > Postgres return type is "bigint for smallint or int arguments, numeric
> for
> > bigint arguments, otherwise the same as the argument data type”[1]
> >
> > SQL Server return type is int for tinyint, smallint or int; bigint for
> > bigint[2].
> >
> > I can see your point that “User demands the correct sum result”. But I’d
> > also be pissed with Postgres if it returned a numeric (arbitrary
> precision)
> > result when I am summing a bigint value. So I don’t think we’re going to
> > please everyone.
> >
> > I think the solution is to add the policy to derive SUM’s return type to
> > as a new method to RelDataTypeSystem. Then Kylin can supply its own.
> >
> > Julian
> >
> > [1] http://www.postgresql.org/docs/9.1/static/functions-aggregate.html <
> > http://www.postgresql.org/docs/9.1/static/functions-aggregate.html>
> >
> > [2] https://msdn.microsoft.com/en-us/library/ms187810.aspx <
> > https://msdn.microsoft.com/en-us/library/ms187810.aspx>
> >
> > > On Aug 9, 2015, at 8:09 PM, Li Yang <liyang@apache.org> wrote:
> > >
> > >> 1. If x is an integer, what is the type of sum(x)?
> > >
> > > This is the key question.  If calcite believes sum(int) = int, then
> Kylin
> > > have to find solution else where.  User demands the correct sum result
> > > anyway.  It's very hard to explain and justify the behavior to user,
> > > because other SQL engines like Postgres seem simply works...
> > >
> > > One workaround maybe let all initial values be bigint.
> > >
> > > On Sat, Aug 8, 2015 at 9:02 AM, Julian Hyde <jhyde@apache.org> wrote:
> > >
> > >> I would approach it a different way. There are 2 separate questions:
> > >>
> > >> 1. If x is an integer, what is the type of sum(x)?
> > >>
> > >> This concerns SQL query validation. Calcite’s answer is that if x has
> > type
> > >> T, then sum(x) has type T. Not perfect, but simple. If x is an int and
> > you
> > >> want the result to be a bigint, just write sum(cast(x as bigint)).
> > >>
> > >> 2. Do we detect overflow while calculating sum, and if so, how?
> > >>
> > >> This is an implementation question, and needs to be solved in each
> > engine.
> > >> Drill is one such engine, and Enumerable is another. Enumerable does
> not
> > >> currently detect overflow.
> > >>
> > >> One strategy would be to use a  higher precision data type internally
> > (but
> > >> this strategy works only if you have an upper bound on the number of
> > input
> > >> rows). Another is to use a method such as java.lang.Math.addExact(int,
> > int).
> > >>
> > >> Julian
> > >>
> > >>
> > >>
> > >>> On Aug 7, 2015, at 10:38 AM, Jinfeng Ni <jinfengni99@gmail.com>
> wrote:
> > >>>
> > >>> I think it makes sense to use bigint as the result type for
> > sum(integer).
> > >>>
> > >>> Postgres seems to work in this way.
> > >>>
> > >>> mydb=# \d+ emp
> > >>>                                 Table "public.emp"
> > >>> Column  |         Type          | Modifiers | Storage  | Stats
> target |
> > >>> Description
> > >>>
> > >>
> >
> ----------+-----------------------+-----------+----------+--------------+-------------
> > >>> empno    | integer               |           | plain    |
> > |
> > >>> .....
> > >>>
> > >>> create table tmp as select sum(empno) sum_eno from emp;
> > >>>
> > >>> \d+ tmp;
> > >>>                        Table "public.tmp"
> > >>> Column  |  Type  | Modifiers | Storage | Stats target | Description
> > >>> ---------+--------+-----------+---------+--------------+-------------
> > >>> sum_eno | bigint |           | plain   |              |
> > >>>
> > >>>
> > >>> As we can see, the column sum_eno in 'tmp' table after the CTAS
> > statement
> > >>> has bigint type.
> > >>>
> > >>> In Drill, we also use bigint for sum(integer).  Drill has to put
> > >> additional
> > >>> logic, since Calcite by default will use int as the result type for
> > >>> sum(int).
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Aug 7, 2015 at 6:51 AM, hongbin ma <mahongbin@apache.org>
> > wrote:
> > >>>
> > >>>> hi,
> > >>>>
> > >>>> Suppose I have a table column called "price", its data type is
> > integer.
> > >>>> it seems that the sum aggregator in "select sum(price) from table"
> > will
> > >>>> return integer type, too.
> > >>>>
> > >>>> When I have millions of rows in the table,
> > >>>> "select sum(price) from table" might overflow, is it a bug?
> > >>>> Or may I how do you look into this problem?
> > >>>>
> > >>>> --
> > >>>> Regards,
> > >>>>
> > >>>> *Bin Mahone | 马洪宾*
> > >>>> Apache Kylin: http://kylin.io
> > >>>> Github: https://github.com/binmahone
> > >>>>
> > >>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message