drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sorabh Hamirwasia <sohami.apa...@gmail.com>
Subject Re: [Discuss] Integrate Arrow gandiva into Drill
Date Fri, 05 Apr 2019 16:27:15 GMT
Hi Weijie,
I think the only case in which that line will be executed is if there is
any UDF like flatten operation which results in producing multiple rows for
each input row. Even though currently Flatten is a separate operator in
Drill but I think that code is there to handle such cases.

Thanks,
Sorabh

On Fri, Apr 5, 2019 at 6:08 AM weijie tong <tongweijie178@gmail.com> wrote:

> The first appearance of the comparison code is at DRILL-620 :
>
> https://github.com/apache/drill/commit/a2355d42dbff51b858fc28540915cf793f1c0fac#diff-e87beb3f2aa0fbc06b07b1d55c3d3536
> . Before DRILL-6340 , according to the ProjectorTemplate's projectRecords
> method and its actual input parameter values , I think  the line 234 of
> ProjectRecordBatch will never be executed. Untill DRILL-6340 , we control
> the output batch memory size, that part of code finally come into use.
>
> If I was wrong, please let me know.
>
> On Fri, Apr 5, 2019 at 12:15 AM weijie tong <tongweijie178@gmail.com>
> wrote:
>
> > Thanks for the reply, But it seems the code has been there even before
> > DRILL-6340.
> >
> > On Thu, Apr 4, 2019 at 10:45 PM Vova Vysotskyi <vvovyk@gmail.com> wrote:
> >
> >> Hi Weijie,
> >>
> >> It is possible if maxOuputRecordCount (received from
> >> memoryManager.getOutputRowCount()) is less than incomingRecordCount.
> >> For more details please see DRILL-6340
> >> <https://issues.apache.org/jira/browse/DRILL-6340> and design document
> >> <
> >>
> https://docs.google.com/document/d/1h0WsQsen6xqqAyyYSrtiAniQpVZGmQNQqC1I2DJaxAA/edit?usp=sharing
> >> >
> >> attached to this Jira.
> >>
> >> Kind regards,
> >> Volodymyr Vysotskyi
> >>
> >>
> >> On Thu, Apr 4, 2019 at 5:17 PM weijie tong <tongweijie178@gmail.com>
> >> wrote:
> >>
> >> > I have a doubt about the ProjectRecordBatch implementation.  Hope
> >> someone
> >> > could give an explanation about that. To the line 234 of
> >> > ProjectRecordBatch, at what case,the projector output row size less
> than
> >> > the input size ?
> >> >
> >> > On Thu, Apr 4, 2019 at 5:11 PM weijie tong <tongweijie178@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Igor:
> >> > > That's a good idea! It could resolve that issue. The basic question
> >> has
> >> > > solved. To use the official Arrow,  there's still two issues needed
> >> to be
> >> > > contributed to Arrow, that I will do:
> >> > > 1. make gcc lib static linked into the jni dynamic lib.
> >> > >   Without this, it will require the platform installed right version
> >> gcc
> >> > > 2. add convertToNull function to gandiva
> >> > >  This could make some project expression with convertToNull function
> >> to
> >> > be
> >> > > gandiva executed
> >> > >
> >> > > Of course, without these two issues solved, I still could give an
> >> > > integration implementation.
> >> > >
> >> > > BTW, once the integration is done. How do we supply the gandiva jni
> >> lib ?
> >> > > Leave it to user to build it ? or we supply different platform
> >> > > distributions?
> >> > >
> >> > >
> >> > > On Thu, Apr 4, 2019 at 3:53 PM Igor Guzenko <
> >> ihor.huzenko.igs@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> Hello Weijie,
> >> > >>
> >> > >> Did you try to create same package as in Arrow, but in Drill and
> use
> >> > >> wrapper class around target for publishing
> >> > >> desired methods with package access ?
> >> > >>
> >> > >> Thanks, Igor
> >> > >>
> >> > >> On Thu, Apr 4, 2019 at 9:51 AM weijie tong <
> tongweijie178@gmail.com>
> >> > >> wrote:
> >> > >> >
> >> > >> > HI :
> >> > >> >
> >> > >> > Gandiva is a sub project of Arrow. Arrow gandiva using LLVM
> codegen
> >> > and
> >> > >> > simd skill could achieve better query performance.  Arrow
and
> Drill
> >> > has
> >> > >> > similar column memory format. The main difference now is
the null
> >> > >> > representation. Also Arrow has made great changes to the
> >> ValueVector.
> >> > To
> >> > >> > adopt Arrow to replace Drill's VV has been discussed before.
That
> >> > would
> >> > >> be
> >> > >> > a great job. But to leverage gandiva , by working at the
physical
> >> > memory
> >> > >> > address level , this work could be little relatively.
> >> > >> >
> >> > >> > Now I have done the integration work at our own branch by
make
> some
> >> > >> changes
> >> > >> > to the Arrow branch, and issued DRILL-7087 and ARROW-4819.
The
> main
> >> > >> changes
> >> > >> > to ARROW-4819 is to make some package level method to be
public.
> >> But
> >> > >> arrow
> >> > >> > community seems not plan to accept this change. Their advice
is
> to
> >> > have
> >> > >> a
> >> > >> > arrow branch.
> >> > >> >
> >> > >> > So what do you think?
> >> > >> >
> >> > >> > 1、Have a self branch of Arrow.
> >> > >> > 2、waiting for the Arrow integration completely.
> >> > >> > or some other ideas?
> >> > >>
> >> > >
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message