drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: [DISCUSSION] should BufferAllocator.buffer() throw an OutOfMemoryException ?
Date Tue, 30 Jun 2015 22:35:23 GMT
I started this discussion while fixing DRILL-3312
<https://issues.apache.org/jira/browse/DRILL-3312>, it took me quite some
time to debug and find the root of an error I was seeing. I first thought
about fixing those few places that call buffer() without checking if the
buffer is null or not, but this won't prevent new code from repeating the
same mistakes.


On Tue, Jun 30, 2015 at 3:19 PM, Jacques Nadeau <jacques@apache.org> wrote:

> Lots of different comments on this thread.  Here are few of my thoughts:
>
> I think you need to differentiate the different tiers of interfaces a
> little bit more.
>
> BufferAllocator.buffer() was meant to be a low level advanced interface.
> It should be used rarely and when done so, only with extreme care.  Most
> devs shouldn't use it.  Having an advanced interface seems fine here.  This
> was how the interface was envisioned.   It looks like there are currently
> 16 classes that reference this code not including vectors.  (There probably
> should be less, it looks like some code duplication contributes to this.)
>
> All the vector classes also access this code.  Vectors have two methods:
> allocateNew and allocateNewSafe.  allocateNew has an exception behavior and
> allocateNewSafe has a return value behavior.  These are main APIs that are
> used extensively throughout the code and the ones we should be most focused
> on.  I like giving the developer the choice of behavior (thus the two
> options).  It sounds like others are saying to remove this choice (because
> code is not managing it correctly).
>
> If there are bugs in the 16 classes on how they use the advanced interface,
> this doesn't seem like a big challenge to correct rather than bulk
> modifying the API.  If people are using the wrong method on the vector
> classes (or the vector classes aren't correctly behaving), that doesn't
> seem like something we should address at the allocator API level.
>
> Hakim, can you be more specific about your concern?  It doesn't seem like
> we're talking about thousands of places here.
>
> Regarding checked versus unchecked...  I'm strongly in support of using
> checked exceptions for most things.  However, I'm against it for an out of
> memory condition.  (Note how the JVM also treats this as an unchecked
> exception.) It causes early catching when, in most cases, we really want to
> bubble way up the stack and fail the query.  I know of very few places
> where we can actually locally manage an out of memory condition so why
> write huge amounts of code the bubble the exception all the way to where we
> can actually handle it.
>
> A quick sidebar to Jason's "Hash agg out of memory" comment
> Two thoughts here.  I'm not sure this is a meaningful message.  Where we
> run out of memory may have nothing to do with where we are using most
> memory.  And if we did to decide this was useful, this can be done just as
> easily with unchecked exceptions as checked exception.  What I'm want to
> avoid is adding throws OutOfMemoryException in 1000 places.  The reality is
> everybody should assume that an OutOfMemory can occur pretty much anytime.
>
> TL;DR
>
> In general, I think this discussion is too abstract and thus has the danger
> of trending towards a religious discussion (I already see a bit of that
> fervor in some of the responses.).  The number of places we are talking
> about are not that large and are easy to find.  Let's solve this by looking
> at specifics.  For example, if people think checked would be better,
> someone should go through and make an example changeset for everyone to
> review.  My guess is, the only that it works will be if very quickly, the
> handling code converts the checked exception into an unchecked
> UserException.  But I'm more than happy to be proven wrong.
>
> On Tue, Jun 30, 2015 at 2:16 PM, Daniel Barclay <dbarclay@maprtech.com>
> wrote:
>
> > Hanifi GUNES wrote:
> >
> >> - We would end up having to add it to almost everything everywhere
> >> Why would one propagate the checked exception for no reason? And why
> would
> >> one not propagate the exception for a good reason like robustness? I
> agree
> >> that one has to avoid propagating the checked exception for no reason
> >> however I support propagating it for a good reason.
> >>
> >> The added benefit of raising a checked exception is reminding as well as
> >> enforcing devs to handle it and be more cautious about this particular
> >> event. I find this compile-time safety check invaluable for robustness.
> >>
> >
> > +(1 times <some large number>)
> >
> > Daniel
> >
> >
> >
> >>
> >> - Or constantly wrapping it with RuntimeException to get around that
> >> If it has to be done, I would recommend relying on a helper to do so.
> >> There
> >> is not much of man-work involved here.
> >>
> >>
> >> 2015-06-30 13:53 GMT-07:00 Abdel Hakim Deneche <adeneche@maprtech.com>:
> >>
> >>  +1 to Hanifi's
> >>>
> >>> On Tue, Jun 30, 2015 at 1:38 PM, Jason Altekruse <
> >>> altekrusejason@gmail.com
> >>>
> >>>>
> >>>>  wrote:
> >>>
> >>>  +1 to Hanifi's comments, I think it makes much more sense to have a
> >>>>
> >>> number
> >>>
> >>>> of sites where the operators are explicitly catching a checked OOM
> >>>> exception and either decide to handle it or produce a message like
> "Hash
> >>>> Aggregate does not support our of memory conditions". This would be
> >>>> particularly useful for debugging queries, as the user exception can
> >>>> provide context information about the current operation. This way
> users
> >>>>
> >>> can
> >>>
> >>>> have some idea about the part of their query that might be causing an
> >>>> excessive strain on system resources. I understand that there are also
> >>>> cases where operators competing for memory can make it a toss up to
> >>>> which
> >>>> will actually fail, but this would at least be a step to give more
> >>>>
> >>> detailed
> >>>
> >>>> information to users.
> >>>>
> >>>> On Tue, Jun 30, 2015 at 1:28 PM, Hanifi GUNES <hg@apache.org>
wrote:
> >>>>
> >>>>  I would propose throwing a checked exception encouraging explicit and
> >>>>> consistent handling of this event. Each sub-system has liberty to
> >>>>>
> >>>> decide
> >>>
> >>>> if
> >>>>
> >>>>> an OOM failure is fatal or non-fatal depending on its capabilities.
> >>>>>
> >>>> Also
> >>>
> >>>> if
> >>>>
> >>>>> at some point a sub-system needs to communicate with its callers
via
> a
> >>>>> different mechanism such like using flags (boolean, enum etc) or
> >>>>>
> >>>> raising
> >>>
> >>>> an
> >>>>
> >>>>> unchecked exception that's still fine, just handle the exception.
If
> >>>>>
> >>>> there
> >>>>
> >>>>> is a need to suppress the checked exception that's fine too, just
> use a
> >>>>> helper method.
> >>>>>
> >>>>> Either way, returning *null* sounds problematic in many ways i)
it is
> >>>>> implicit ii) unsafe iii) its handling logic is repetitive iv) it
is
> >>>>> semantically unclean to make null mean something - even worse
> something
> >>>>> context specific.
> >>>>>
> >>>>>
> >>>>> -Hanifi
> >>>>>
> >>>>> 2015-06-30 12:23 GMT-07:00 Abdel Hakim Deneche <
> adeneche@maprtech.com
> >>>>>
> >>>> :
> >>>>
> >>>>>
> >>>>>  I guess that would fix the issue too. But we may still run into
> >>>>>>
> >>>>> situations
> >>>>>
> >>>>>> where the caller will pass a flag to "mute" the exception and
not
> >>>>>>
> >>>>> handle
> >>>>
> >>>>> the case anyway.
> >>>>>>
> >>>>>> If .buffer() unconditionally throws an exception, can't the
caller,
> >>>>>>
> >>>>> who
> >>>
> >>>> wants to, just catch that and handle it properly ?
> >>>>>>
> >>>>>> On Tue, Jun 30, 2015 at 12:13 PM, Chris Westin <
> >>>>>>
> >>>>> chriswestin42@gmail.com>
> >>>>
> >>>>> wrote:
> >>>>>>
> >>>>>>  No, but we should do something close to that.
> >>>>>>>
> >>>>>>> There are cases where the caller can handle the inability
to get
> >>>>>>>
> >>>>>> more
> >>>
> >>>> memory, and may be able to go to disk. However, you are correct
> >>>>>>>
> >>>>>> that
> >>>
> >>>> there
> >>>>>>
> >>>>>>> are many that can't handle an OOM, and that fail to check.
> >>>>>>>
> >>>>>>> Instead of unconditionally throwing OutOfMemoryRuntimeException,
I
> >>>>>>>
> >>>>>> would
> >>>>>
> >>>>>> suggest that the buffer() call take a flag that indicates whether
> >>>>>>>
> >>>>>> or
> >>>
> >>>> not
> >>>>>
> >>>>>> it
> >>>>>>
> >>>>>>> should throw if it is unable to fulfill the request. This
way, the
> >>>>>>>
> >>>>>> call
> >>>>
> >>>>> sites that can handle an OOM can pass in the flag to return null,
> >>>>>>>
> >>>>>> and
> >>>
> >>>> the
> >>>>>
> >>>>>> rest can pass in the flag value to throw, and not have to have
any
> >>>>>>>
> >>>>>> checking
> >>>>>>
> >>>>>>> code.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Jun 30, 2015 at 12:06 PM, Abdel Hakim Deneche <
> >>>>>>> adeneche@maprtech.com
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>
> >>>>>>>  our current implementations of BufferAllocator.buffer(int,
int)
> >>>>>>>>
> >>>>>>> returns
> >>>>>
> >>>>>> null when it cannot allocate the buffer.
> >>>>>>>>
> >>>>>>>> But looking through the code, there are many places
that don't
> >>>>>>>>
> >>>>>>> check
> >>>>
> >>>>> if
> >>>>>
> >>>>>> the
> >>>>>>>
> >>>>>>>> allocated buffer is null before trying to access it
which will
> >>>>>>>>
> >>>>>>> throw
> >>>>
> >>>>> a
> >>>>>
> >>>>>> NullPointerException.
> >>>>>>>>
> >>>>>>>> ValueVectors' allocateNewSafe() seem to be the only
place that
> >>>>>>>>
> >>>>>>> handle
> >>>>
> >>>>> the
> >>>>>>
> >>>>>>> null in a specific manner.
> >>>>>>>>
> >>>>>>>> Should we update the allocators' implementation to throw
an
> >>>>>>>> OutOfMemoryRuntimeException instead of returning null
? this has
> >>>>>>>>
> >>>>>>> the
> >>>>
> >>>>> added
> >>>>>>>
> >>>>>>>> benefit of displaying a proper out of memory error message
to the
> >>>>>>>>
> >>>>>>> user.
> >>>>>
> >>>>>>
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>>
> >>>>>>>> Abdelhakim Deneche
> >>>>>>>>
> >>>>>>>> Software Engineer
> >>>>>>>>
> >>>>>>>>    <http://www.mapr.com/>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>>>> <
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>
> >>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>>
> >>>>>> Abdelhakim Deneche
> >>>>>>
> >>>>>> Software Engineer
> >>>>>>
> >>>>>>    <http://www.mapr.com/>
> >>>>>>
> >>>>>>
> >>>>>> Now Available - Free Hadoop On-Demand Training
> >>>>>> <
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>
> >>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> Abdelhakim Deneche
> >>>
> >>> Software Engineer
> >>>
> >>>    <http://www.mapr.com/>
> >>>
> >>>
> >>> Now Available - Free Hadoop On-Demand Training
> >>> <
> >>>
> >>>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >>>
> >>>>
> >>>>
> >>>
> >>
> >
> > --
> > Daniel Barclay
> > MapR Technologies
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message