drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: buffer allocation of cast into var length type
Date Wed, 04 Dec 2013 02:59:14 GMT
Hi Jason,

Good question.

Actually, for some type cast, it is *binary coercible, *means there is no
need internally to do any conversion. for instance, char --> varchar,
varchar --> varbinary, etc.

For other cases, some transformation is required, since the binary
representation of source type is different from the binary representation
of target type.
For instance, int -> varchar.  The target type need keep each digit of the
integer, while the source type is a 4-byte representation.

I will look into whether it's possible to use the buffer in the output
value vector directly, without copying into new buffer.





On Tue, Dec 3, 2013 at 6:29 PM, Jason Altekruse <altekrusejason@gmail.com>wrote:

> Hi Jinfeng,
>
> This might be a dumb question, but is there any transformation being
> performed when going from a fixed length type to a variable length type?
> That is, are the bytes in the buffer coming in going to be the same as the
> bytes coming out of the cast?
>
> I understand that for casts like int-> long we need to add extra space
> between each value, but is it possible that we could just hand the buffer
> from one value vector type to the other without copying it into a new
> buffer?
>
> We would still have to create a new buffer with the offsets of the
> "variable length" values, but it would save us some time if we could do
> this.
>
> -Jason Altekruse
>
>
> On Tue, Dec 3, 2013 at 5:35 PM, Jinfeng Ni <jinfengni99@gmail.com> wrote:
>
> > Hi all,
> >
> > I' working on the explicit cast support in drill. So far, I have
> prototyped
> > the implementation for the first 3 categories, and would like to seek
> input
> > from you regarding how to deal with the buffer allocation for cast from
> > fixed-length type into var-length type.
> >
> > 1. cast from fixed-length type to fixed-length type
> > eg:   float4 --> int,
> >         int -> float4,
> >
> > 2. cast from var-length type to fixed-length type
> > eg: varchar --> int
> >       varbinary --> int
> > (Still need to figure out how to handle overflow issue when cast)
> >
> > 3. cast from fixed-length type to var-length type
> > eg:  int  -> varchar
> >        bigint -> varbinary
> >
> > 4. cast from var-length type to var-length type
> > eg:   varchar --> varchar
> >         varbinary --> varchar
> >
> > For the 3rd one, ie. from fixed-length to var-length type, it causes some
> > problem to the current implementation, in terms of buffer allocation.
> >
> > For the fixed-length type, drill uses java primitive type in ValueHolder.
> > For instance, IntHolder.value is a int.  But for var-length type, drill
> > will use a buffer to keep its value. When doing cast from int into
> varchar,
> > the buffer for the VarCharHolder is not allocated, and we have to figure
> > out a way to do the allocation, before cast.
> >
> > There seems 2 options:
> > Option 1:  allocate buffer in the function template setup() method.  The
> > buffer will be used in eval() method.
> > Problem with this option :
> > 1) need copy twice.  first copy from fixed-type input into the buffer
> > allocated in setup(), second copy from the buffer into the buffer in the
> > target vector.
> > 2) need add a cleanup() method to function template, to clean the buffer
> > allocated, which currently is not there in the code base.
> >
> > Option 2:  the consumer of output of the cast function will be
> responsible
> > to pre-allocate buffer in the target ValueVector for all the
> > VarCharHolder().  The cast function will simply do the conversion and
> copy
> > into the pre-allocated buffer in the target ValueVector.
> > Good thing of this option is it requires 1 copy.
> >
> > I have prototyped the 1st option, and have not figured out how to
> implement
> > the 2nd approach yet. But I would like to seek suggestion regarding
> those 2
> > options, before I proceed next.
> >
> > Thanks!
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message