drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject buffer allocation of cast into var length type
Date Tue, 03 Dec 2013 23:35:25 GMT
Hi all,

I' working on the explicit cast support in drill. So far, I have prototyped
the implementation for the first 3 categories, and would like to seek input
from you regarding how to deal with the buffer allocation for cast from
fixed-length type into var-length type.

1. cast from fixed-length type to fixed-length type
eg:   float4 --> int,
        int -> float4,

2. cast from var-length type to fixed-length type
eg: varchar --> int
      varbinary --> int
(Still need to figure out how to handle overflow issue when cast)

3. cast from fixed-length type to var-length type
eg:  int  -> varchar
       bigint -> varbinary

4. cast from var-length type to var-length type
eg:   varchar --> varchar
        varbinary --> varchar

For the 3rd one, ie. from fixed-length to var-length type, it causes some
problem to the current implementation, in terms of buffer allocation.

For the fixed-length type, drill uses java primitive type in ValueHolder.
For instance, IntHolder.value is a int.  But for var-length type, drill
will use a buffer to keep its value. When doing cast from int into varchar,
the buffer for the VarCharHolder is not allocated, and we have to figure
out a way to do the allocation, before cast.

There seems 2 options:
Option 1:  allocate buffer in the function template setup() method.  The
buffer will be used in eval() method.
Problem with this option :
1) need copy twice.  first copy from fixed-type input into the buffer
allocated in setup(), second copy from the buffer into the buffer in the
target vector.
2) need add a cleanup() method to function template, to clean the buffer
allocated, which currently is not there in the code base.

Option 2:  the consumer of output of the cast function will be responsible
to pre-allocate buffer in the target ValueVector for all the
VarCharHolder().  The cast function will simply do the conversion and copy
into the pre-allocated buffer in the target ValueVector.
Good thing of this option is it requires 1 copy.

I have prototyped the 1st option, and have not figured out how to implement
the 2nd approach yet. But I would like to seek suggestion regarding those 2
options, before I proceed next.

Thanks!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message