Before, the input was a byte array so we could read from it directly. Now, the input is a `ByteBufferInputStream` so that Parquet can choose how to allocate buffers. For example, we use vectored reads from S3 that pull back multiple buffers in parallel.

Now that the input is a stream based on possibly multiple byte buffers, it provides a method to get a buffer of a certain length. In most cases, that will create a ByteBuffer with the same backing byte array, but it may need to copy if the request spans multiple buffers in the stream. Most of the time, the call to `slice` only requires duplicating the buffer and setting its limit, but a read that spans multiple buffers is expensive. It would be helpful to know whether the time spent is copying data, which would indicate the backing buffers are too small, or whether it is spent duplicating the backing byte buffer.

On Mon, Sep 14, 2020 at 5:29 AM Sean Owen <srowen@gmail.com> wrote:
Ryan do you happen to have any opinion there? that particular section
was introduced in the Parquet 1.10 update:
https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20
It looks like it didn't use to make a ByteBuffer each time, but read from in.

On Sun, Sep 13, 2020 at 10:48 PM Chang Chen <baibaichen@gmail.com> wrote:
>
> I think we can copy all encoded data into a ByteBuffer once, and unpack values in the loop
>
>  while (valueIndex < this.currentCount) {
>     // values are bit packed 8 at a time, so reading bitWidth will always work
>     this.packer.unpack8Values(buffer, buffer.position() + valueIndex, this.currentBuffer, valueIndex);
>     valueIndex += 8;
>   }
>
> Sean Owen <srowen@gmail.com> 于2020年9月14日周一 上午10:40写道:
>>
>> It certainly can't be called once - it's reading different data each time.
>> There might be a faster way to do it, I don't know. Do you have ideas?
>>
>> On Sun, Sep 13, 2020 at 9:25 PM Chang Chen <baibaichen@gmail.com> wrote:
>> >
>> > Hi export
>> >
>> > it looks like there is a hot spot in VectorizedRleValuesReader#readNextGroup()
>> >
>> > case PACKED:
>> >   int numGroups = header >>> 1;
>> >   this.currentCount = numGroups * 8;
>> >
>> >   if (this.currentBuffer.length < this.currentCount) {
>> >     this.currentBuffer = new int[this.currentCount];
>> >   }
>> >   currentBufferIdx = 0;
>> >   int valueIndex = 0;
>> >   while (valueIndex < this.currentCount) {
>> >     // values are bit packed 8 at a time, so reading bitWidth will always work
>> >     ByteBuffer buffer = in.slice(bitWidth);
>> >     this.packer.unpack8Values(buffer, buffer.position(), this.currentBuffer, valueIndex);
>> >     valueIndex += 8;
>> >   }
>> >
>> >
>> > Per my profile, the codes will spend 30% time of readNextGrou() on slice , why we can't call slice out of the loop?


--
Ryan Blue