spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chang Chen <baibaic...@gmail.com>
Subject Re: Performance of VectorizedRleValuesReader
Date Mon, 14 Sep 2020 03:48:41 GMT
I think we can copy all encoded data into a ByteBuffer once, and unpack
values in the loop

 while (valueIndex < this.currentCount) {
    // values are bit packed 8 at a time, so reading bitWidth will always
work
    this.packer.unpack8Values(buffer, buffer.position() + valueIndex,
this.currentBuffer, valueIndex);
    valueIndex += 8;
  }

Sean Owen <srowen@gmail.com> 于2020年9月14日周一 上午10:40写道:

> It certainly can't be called once - it's reading different data each time.
> There might be a faster way to do it, I don't know. Do you have ideas?
>
> On Sun, Sep 13, 2020 at 9:25 PM Chang Chen <baibaichen@gmail.com> wrote:
> >
> > Hi export
> >
> > it looks like there is a hot spot in
> VectorizedRleValuesReader#readNextGroup()
> >
> > case PACKED:
> >   int numGroups = header >>> 1;
> >   this.currentCount = numGroups * 8;
> >
> >   if (this.currentBuffer.length < this.currentCount) {
> >     this.currentBuffer = new int[this.currentCount];
> >   }
> >   currentBufferIdx = 0;
> >   int valueIndex = 0;
> >   while (valueIndex < this.currentCount) {
> >     // values are bit packed 8 at a time, so reading bitWidth will
> always work
> >     ByteBuffer buffer = in.slice(bitWidth);
> >     this.packer.unpack8Values(buffer, buffer.position(),
> this.currentBuffer, valueIndex);
> >     valueIndex += 8;
> >   }
> >
> >
> > Per my profile, the codes will spend 30% time of readNextGrou() on slice
> , why we can't call slice out of the loop?
>

Mime
View raw message