calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stamatis Zampetakis <zabe...@gmail.com>
Subject Re: Re: Exception-handling in built-in functions
Date Thu, 18 Oct 2018 21:56:03 GMT
Hello,

First of all, thanks Hongze for working on this!

CALCITE-525 asks for throwing exceptions in certain circumstances and I
think the fix should focus exclusively on this.

Regarding the discussion about additional actions in case of an error
(omitting rows or logging) I believe it is useful but I think it could be
achieved by other means such as overloading the operators (using
SqlOperatorTable). Omitting rows or logging is not a very common policy to
become part of the core project. I think that projects relying on Calcite
and need this kind of functionality could provide their own operator
overloads relatively easy.

Providing additional built in functions, such as CATCH_ERROR, might become
handy in some situations but again it is something that individual projects
could easily plugin as a user defined function.

Note that, I am not against the existing PR but I would like to highlight
an alternative option.

Best,
Stamatis

Στις Πέμ, 18 Οκτ 2018 στις 1:36 μ.μ., ο/η notifyzhz@126.com <
notifyzhz@126.com> έγραψε:

> Hi,
>
> Thanks for opening this discussion and thank you all for your replies.
>
> The reason why I would like to help solve CALCITE-525 is because I agree
> that an function error should not easily break a large job, this "job"
> could indicate a large ad-hoc query or a query on a stream, or like what
> Julian said, a ETL job.
> According to some tests I had run about Calcite streaming, a simple
> division by zero error could break the whole query job. Say, If Calcite
> does not provide a option to silence the error, users should spend extract
> daily time on checking whether their streaming job is alive if their SQL
> includes a "/" operator.
>
> The solutions of B and C are different: B is to ignore and drop a whole
> row when error happens, and C is to make the failed call return null value.
> In C, the major reason to introduce "CATCH_ERROR" function is to represent
> the error handling on Rel, whether to enable the function in Parser is not
> important.
> At first I personally prefer C, but as C does need more code change than
> B, I don't have a very strong inclination between B and C now.
>
> At last, I agree that making enough discussion on introducing new feature
> is very important. If we don't really need such a feature now, I think I
> can help whenever it is considered helpful in future.
>
> Thanks,
> Hongze
>
> From: Jesus Camacho Rodriguez
> Date: 2018-10-18 10:26
> To: dev@calcite.apache.org
> Subject: Re: Exception-handling in built-in functions
> I do not believe there is enough reason to block CALCITE-525. IMO,
> CALCITE-525 describes a problem that some Calcite users are facing and a
> reasonable plugable solution. We should not be vetoing such a feature
> without providing viable alternatives. (Without having checked the specific
> implementation details, I prefer approach B described below as it is less
> intrusive. And A should be fixed in a different issue.)
> I agree with Julian´s idea that Calcite is not a RDBMS such as Oracle or
> Postgres, and it has always tried to provide flexibility to underlying
> engines, one of the reasons for its wide adoption. In addition, systems are
> not forced to use this feature, it is tagged as experimental and by default
> we are still running in same mode. I believe that is sufficient.
> Personally, I will not be happy if a developer feels compelled to fork
> Calcite or stop contributing code because we do not accept features such as
> the one described there.
>
> Thanks,
> Jesús
>
>
> On 10/17/18, 5:17 PM, "Michael Mior" <mmior@apache.org> wrote:
>
>     My apologies for missing this thread a couple days ago. (Thanks for
> pinging
>     it.) Here's my two cents: taking care of contributors to the project is
>     just as important (if not more important) than taking care of the
> code. I'm
>     not saying we should merge terrible code just to keep each other
> happy, but
>     I don't think that's the case here. If anyone writes some code which
> you
>     disagree with, you should be free to voice your disagreement. However,
>     especially when the code is from a core contributor and the argument
>     focuses on potential future problems, I think it's important to
> consider
>     that people who have shown dedication to the project over the years are
>     very likely to be around and willing to fix these problems as they
> arise.
>
>     Code which turns out to cause problems can always be deleted, reverted,
>     refactored, etc. It's much harder to back out when a contributor is
> burned
>     out or interpersonal conflicts get heated.
>
>     --
>     Michael Mior
>     mmior@apache.org
>
>
>     Le mer. 17 oct. 2018 à 14:58, Julian Hyde <jhyde@apache.org> a écrit :
>
>     > Vladimir,
>     >
>     > You’ve made your points. And I hear them.
>     >
>     > However I get the impression that you are not open to persuasion.
> Which
>     > means that I am wasting my time trying to reach consensus with you.
> Which
>     > means that people win arguments not on merit, but based upon who is
> most
>     > persistent.
>     >
>     > Here is my point. Calcite's goal is not to re-create what Oracle or
>     > PostgreSQL did ten years later. It is a platform that allows people
> to
>     > write their own data engine. If they want to redefine the “+”
> operator such
>     > that 2 + 2 returns 5, the platform should allow it.
>     >
>     > Certainly if they want to engineer their own error-handling
> strategy, we
>     > should let them do it. I didn’t have the energy to find an example
> of a SQL
>     > engine that discards rows with divide-by-zero errors, but I believe
> there
>     > is one. I suspect that both Broadbase, SQLstream and Hive, three SQL
>     > engines that I have worked on that performed ETL-like tasks, all had
> that
>     > capability. And all ETL tools have very flexible error-handling
> strategies.
>     > They are not SQL-based, but Calcite is not exclusively for SQL
> systems.
>     >
>     > I have been designing and building world-class data engines for 30
> years.
>     > Please take me on good faith that a flexible error-handing strategy
> is a
>     > good idea. Don’t force me to bicker over email for hours and hours.
> When a
>     > long discussion leads to the rejection of a contribution, I get
>     > considerably closer to burning out.
>     >
>     > Julian
>     >
>     >
>     > > On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov <
>     > sitnikov.vladimir@gmail.com> wrote:
>     > >
>     > > Juilian>Hey, folks. We need your input here.
>     > >
>     > > Here are my thoughts:
>     > > 1) I think the features we add should have at least some level of
>     > > consistency
>     > > 2) It is much safer to adopt well-known features rather than be
> pioneers
>     > in
>     > > the field. I do not mean we must wait for someone else implement
> and try
>     > > out a feature, however I would not rush for implementing a feature
> that
>     > > no-one else explored.
>     > >
>     > > CALCITE-525 has two key points:
>     > > A) Current implementation of enumerable factors code like 0/0 to a
> static
>     > > field of a generated code. It causes the generated code to fail at
> load
>     > > time even before the query is executed.
>     > > Of course that is a bug, and I'm even inclined to remove that
> "static
>     > > fields"
>     > >
>     > > B) Someone (Hongze? Juilan?) suggest to implement a mode to
> silently
>     > ignore
>     > > the error (e.g. by ignoring the row or by returning default value).
>     > > First of all, I don't think "ignore the row" kind of processing
> would do
>     > > any good to the user since it would not be possible to predict the
>     > output.
>     > > "ignore the row" is very tricky when join/semijoin/antijoin is
> there.
>     > >
>     > > I'm sure OracleDB and PostgreSQL do NOT have such "features", so I
> think
>     > we
>     > > should not rush for it.
>     > >
>     > > C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or
> CATCH_ERROR(1 /
>     > > 0)  EMPTY ON ERROR  kind of functions.
>     > > That enables to confine the scope of the error, however I don't
> think it
>     > > would be used often (does that mean one would have to wrap each
>     > > expression?), and this "catch error" would be non-trivial to
> propagate to
>     > > the downstream executors.
>     > > On top of that, we might end up inventing full-blown
>     > try-catch-catch-catch
>     > > syntax.
>     > >
>     > > I truly see no business value in implementing B/C, however I do
> see the
>     > > pain it would introduce. It would complicate Calcite maintenance.
> "B"
>     > could
>     > > silently produce wrong results, and I'm sure we don't want get
> results
>     > out
>     > > of thin air.
>     > >
>     > > Vladimir
>     >
>     >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message