drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5602) Vector corruption when allocating a repeated map vector
Date Thu, 22 Jun 2017 03:53:00 GMT
Paul Rogers created DRILL-5602:
----------------------------------

             Summary: Vector corruption when allocating a repeated map vector
                 Key: DRILL-5602
                 URL: https://issues.apache.org/jira/browse/DRILL-5602
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers
             Fix For: 1.11.0


The query in DRILL-5513 highlighted a problem described in DRILL-5594: that the external sort
did not properly allocate its spill batch vectors, and instead allowed them to grow by doubling.
While fixing that issue, a new issue became clear.

The method to allocate a repeated map vector, however, has a serious bug, as described in
DRILL-5530: value vectors do not zero-fill the first allocation for a vector (though subsequent
reallocs are zero-filled.)

If the code worked correctly, here is the behavior when writing to the first element of the
list:

* Access the offset vector at offset 0. Should be 0.
* Write the new value at that offset. Since the first offset is 0, the first value is written
at 0 in the value vector.
* Write into offset 1 the value at offset 0 plus the length of the new value.

But, the offset vector is not initialized to zero. Instead, offset 0 contains the value 16
million. Now:

* Access the offset vector at offset 0. Value is 16 million.
* Write the new value at that offset. Write at position 16 million. This requires growing
the value vector from its present size to 16 MB.

The problem is here in {{RepeatedMapVector}}:

{code}
  public void allocateOffsetsNew(int groupCount) {
    offsets.allocateNew(groupCount + 1);
  }
{code}

Notice that there is no code to set the value at offset 0.

Then, in the {{UInt4Vector}}:

{code}
  public void allocateNew(final int valueCount) {
    allocateBytes(valueCount * 4);
  }

  private void allocateBytes(final long size) {
    ...
    data = allocator.buffer(curSize);
    ...
{code}

The above eventually calls the Netty memory allocator, which explicitly states that, for performance
reasons, it does not zero-fill its buffers.

The code works in small tests because the new buffer comes from Java direct memory, which
*does* zero-fill the buffer.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message