drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] paul-rogers edited a comment on pull request #2183: DRILL-6547: IllegalStateException: Tried to remove unmanaged buffer in concat function
Date Fri, 05 Mar 2021 01:24:28 GMT

paul-rogers edited a comment on pull request #2183:
URL: https://github.com/apache/drill/pull/2183#issuecomment-791072405

   Thanks for fixing this.
   I suspect the problem here runs deeper than this particular fix. This is Drill: nothing
is simple, things build on hacks which build on undocumented complexities. I suspect that
is the issue here.
   In particular, we need to know the rules for buffer ownership in UDFs which is, of course,
not documented. The only reasonable way to learn this is to run a query, save the generated
Java code and inspect the calls. Warning: the generated code is a huge mess.
   The key question is this: when you declare an `@Inject` variable (`buffer` in your case),
does the runtime provide an initial buffer? My guess is that it does in the form of a "managed"
buffer. (This just means that there is some auto-free mechanism.) I believe that the buffer
is allocated on each call, but the generated code will say for sure. (The UDF code is copied
into the generated code.)
   The function declares an `@Output`, in this case `out`. If this variable provides a buffer,
it would seem that ownership has to pass to the caller, since the caller will copy that buffer
into a value vector. My guess is that if the buffer is "managed", it will be freed automatically
at some point.
   (Note that managed buffers can cause an OOM because of when they are freed: I think it
is during fragment shutdown, IIRC. I had to fix a reader bug related to this issue.)
   To summarize, it appears that the proper pattern in UDFs is:
   * Declare an `@Inject` buffer which the runtime will provide.
   * Resize the buffer if needed, using the return value from `reallocIfNeeded`.
   * Use that buffer as the output buffer by setting it to `out.buffer` where needed.
   * Managed buffers are freed automatically, at some unknown point.
   If this is true, there is no reason to update `buffer`, the proper pattern is shown in
(with my own comments):
   ``` java
       // Inject means that the runtime will provide a default-size buffer on each call
       @Inject DrillBuf buffer;
       // Holder for output. NOT initialized b the runtime.
       @Output ${type.outputType}Holder out;
           // Resize the provided buffer, replacing if necessary, and assigning that
           // to the output variable.
           out.buffer = buffer.reallocIfNeeded(len);
           // Using the now-valid output buffer, copy in any needed data.
           out.buffer.setBytes(0, bytes);
           // Upon exit, is the output buffer is now owned by caller? Must be, or the output
           // data would be immediately freed if the caller were to release the injected
           // buffer, and all hell would break loose.
   As usual, there is no documentation of this, you just gotta know (or figure it out every
   The buffer management happens in `BufferManagerImpl`, as shown in the JIRA stack trace,
which is called by `reallocIfNeeded`:
     public DrillBuf replace(DrillBuf old, int newSize) {
       if (managedBuffers.remove(old.memoryAddress()) == null) {
         throw new IllegalStateException("Tried to remove unmanaged buffer.");
       return getManagedBuffer(newSize);
   This is saying, "Hey, that buffer you gave me? It ain't one of mine." The question is why.
The theory in this bug is that we forgot to replace our working buffer. Let's check that by
doing a [search for `reallocIfNeeded`](https://github.com/apache/drill/search?p=2&q=reallocIfNeeded).
We find many cases where we don't save the reallocated buffer back into the `@Inject` variable.
For [example](https://github.com/apache/drill/blob/2eb6979206ad1b6f830a070dffa822edbfac6189/exec/java-exec/src/main/codegen/templates/Decimal/DecimalAggrTypeFunctions1.java#L113):
   ``` java
           out.buffer = buffer.reallocIfNeeded(len);
           out.buffer.setBytes(0, bytes);
   I *suspect*, but am not sure without running the code, that every call to the UDF is issued
a new managed `buffer`. So, I suspect that the problem will only occur if, within a single
function call, we:
   * Call `reallocIfNeeded` and get a new buffer.
   * Call `reallocIfNeeded` on the original (now unregistered) buffer, but only if the previous
call actually resized the buffer.
   Looking at the code in this function, there are no loops, so there is no ability to make
the above mistake. Yet, many other functions in the file changed here have the exact same
code pattern introduced in this fix. Why does the code work so often if many other functions
*do not* follow the pattern adopted here?
   There are two possibilities:
   * The generated code has a bug: it is not providing a new buffer on each function invocation.
   * The problem is not where we think it is: it is elsewhere. (The stack trace in the JIRA
ticket shows generated code, so we can't be sure which bit of code is actually making the
   In particular, I'm not convinced that the unit test reproduces the error. The unit test
is different from the query in the JIRA ticket. A good practice is to run the proposed test
*before* any fix to make sure we can reproduce the issue.
   The proposed test (simplified):
   ``` sql
   select concat(cast(null as varchar), EXPR$0) as c1
               "from (values('foo'), ('bar'))
   User's query (also simplified):
   ``` sql
   select concat(BinaryValue, 'aaa') NewVarcharValue from ...;
   What is different? It may not be obvious, but in the first query, we have a `NULL` const
and a column (value vector, which happens to be defined by a `VALUES` clause with a constant).
In the second, we have a column value and a string constant. That string constant is not stored
in a value vector, it is in a temporary buffer (IIRC).
   My hunch is that the problem is with the constant: it is causing something to break. This
is why so many of the functions in `StringFunctions.java` have code like the proposed fix:
they are also likely tested with constants. If the problem is with constants, then it means
that all other UDFs that work with strings, but don't have the proposed fix, will fail when
used with constants.
   That is, sounds like we got us a bit of a problem; this fix is just the tip of the iceberg.
   How should we proceed? We can do just this fix, and fix other issues as people encounter
and report them. Or, we can figure out what is *really* going on by looking at the generated
code. To do that, modify the [Project operator](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java#L279):
   ``` java
       boolean saveCode = true; \\ change from false in the master branch
   Set the variable to `true` as shown above (it will be `false` by default.) Run a query
like the user's. Code will be generated. By default I think it is in `/tmp/drill/codegen/...`.
Analyze what's going on with the `@Inject` buffer.
   Pay particular attention to how constants are handled. What is causing us to try to reallocate
a non-managed buffer in this case?
   As it turns out, with this trick, you can actually *debug* the generated code. A bit after
the above line is one that generates the file. Set a breakpoint after that line. Then, go
hunt down the generated file. Set a breakpoint somewhere. Let the code run. It will stop at
your breakpoint in the generated code. This is easy in Eclipse. IIRC, IntelliJ users have
to do some trick to put the code in a path that IntelliJ likes. Jinfeng may have left a note
somewhere from how he made it work many years ago.
   Then, we can decide how to proceed:
   * If the fix here is the correct one, make it in all UDFs that have the wrong pattern.
(See the search above, there are **many** such occurrences.) Add a comment to the `reallocIfNeeded`
method to explain the proper usage. Provide a code snippet to show what should be done.
   * If the problem is with constant handling, work out what should be done. Fix it. Revert
the `out.buffer = buffer = buffer.ralloc...` bits of code in `StringFunctions.java` and retest
to ensure constants work correctly.
   BTW, in researching these issues, I found a number of other problems.
   If the proposed fix is correct, then a particularly nasty buf is [here](https://github.com/apache/drill/blob/2eb6979206ad1b6f830a070dffa822edbfac6189/exec/vector/src/main/java/org/apache/drill/exec/vector/ValueHolderHelper.java#L121)
where the new buffer is created internally and never returned.
   The function we're trying to fix, `ConcatLeftNullInput`, has a bug:
   ``` java
   public static class ConcatLeftNullInput implements DrillSimpleFunc {
       public void eval() {
         out.buffer = buffer.reallocIfNeeded( (left.end - left.start) + (right.end - right.start));
   The function is `...LeftNullInput` and, yet, we are asking for the `left.end` and `left.start`.
By definition these are zero. Why? Because the left is `NULL`. In fact, if the left value
is `NULL`, I'm not sure the `start` and `end` values are even defined to be any particular
value. (Amazing this stuff works as well as it does.)
   In the same file, this code is *just plain wrong*:
   ``` java
   public static class Replace implements DrillSimpleFunc {
       @Inject DrillBuf buffer;
       @Output VarCharHolder out;
       public void setup() {
         buffer = buffer.reallocIfNeeded(8000);
   See the bug? What if the new string is larger than 8K? Kaboom. What if the string is 10
characters? Huge waste of memory. The above is what we call "crude and not quite effective",
it works most of the time, except when it doesn't.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

View raw message