drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hanifi Gunes <hgu...@maprtech.com>
Subject Re: question about correlated arrays and flatten
Date Tue, 02 Jun 2015 18:19:52 GMT
That's right. I guess that's what I am proposing to have here implicitly. I
am not sure how feasible this would be, however, we should be able to
interpret inline lambda like expressions. This is something to discuss as
we improve Drill's complex data handling capabilities. I see a great value
added here - especially for computationally-intense workloads.

select fold(t.numbers, 0, (r, c) => r + c), map(t.numbers, (n) => n*n) from
dfs.`some/table` t

-Hanifi

On Mon, Jun 1, 2015 at 3:28 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> How could we make functional primitives work without lambda?
>
>
>
> On Mon, Jun 1, 2015 at 9:55 PM, Hanifi Gunes <hgunes@maprtech.com> wrote:
>
> > Idea of having functional primitives with Drill sounds really handy. It
> > would be great if we could support left-right folding as well. I can see
> > many great use cases of project/map, fold/reduce, zip, flatten when
> > combined.
> >
> > On Sat, May 30, 2015 at 12:57 AM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > OK.  I will file a JIRA for a zip function.  No idea if I will be able
> to
> > > get one written in the available cracks of time.
> > >
> > >
> > >
> > > On Fri, May 29, 2015 at 7:17 PM, Steven Phillips <
> sphillips@maprtech.com
> > >
> > > wrote:
> > >
> > > > I think your use case could be solved by adding a UDF that can
> combine
> > > > multiple arrays into a single array. The result of this function
> could
> > > then
> > > > be handled by our current implementation of flatten.
> > > >
> > > > I think this is preferable to enhancing flatten itself to handle it,
> > > since
> > > > flatten is not an ordinary UDF, and thus more difficult to modify and
> > > > maintain.
> > > >
> > > > On Fri, May 29, 2015 at 3:20 PM, Ted Dunning <ted.dunning@gmail.com>
> > > > wrote:
> > > >
> > > > > My particular use case can throw an error if the lists are
> different
> > > > > length.
> > > > >
> > > > > I think our real goal should be to have a logically complete set
of
> > > > simple
> > > > > primitives that lets any sort of back and forward conversions of
> this
> > > > kind.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, May 29, 2015 at 9:58 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I understand what you want to do, unfortunately we don't have
> > support
> > > > for
> > > > > > this right now. A UDF is the best I can suggest at this point.
> > > > > >
> > > > > > Just to explore the idea a little further for the sake of
> creating
> > a
> > > > > > complete feature request, I assume you would just want nulls
> filled
> > > in
> > > > > for
> > > > > > the cases where the lists were different lengths?
> > > > > >
> > > > > > On Fri, May 29, 2015 at 8:58 AM, Ted Dunning <
> > ted.dunning@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Input is here:
> > > https://gist.github.com/tdunning/07ce66e7e4d4af41afd7
> > > > > > >
> > > > > > > Output is here:
> > > > https://gist.github.com/tdunning/3aa841c56bfcdc0ab90e
> > > > > > >
> > > > > > > log-synth schema for generating input data is here:
> > > > > > > https://gist.github.com/tdunning/638dd52c00569ffa9582
> > > > > > >
> > > > > > >
> > > > > > > Preferred syntax would be like
> > > > > > >
> > > > > > > select flatten(t, v1, v2) from ...
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, May 29, 2015 at 7:04 AM, Neeraja Rentachintala
<
> > > > > > > nrentachintala@maprtech.com> wrote:
> > > > > > >
> > > > > > > > Ted
> > > > > > > > can you pls give an example with few data elements
in a, b
> and
> > > the
> > > > > > > expected
> > > > > > > > output you are looking from the query.
> > > > > > > >
> > > > > > > > -Neeraja
> > > > > > > >
> > > > > > > > On Fri, May 29, 2015 at 6:43 AM, Ted Dunning <
> > > > ted.dunning@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I have two arrays.  Their elements are correlated
times and
> > > > values.
> > > > > > I
> > > > > > > > > would like to flatten them into rows, each with
two
> elements.
> > > > > > > > >
> > > > > > > > > The query
> > > > > > > > >
> > > > > > > > >    select flatten(a), flatten(b) from ...
> > > > > > > > >
> > > > > > > > > doesn't work because I get the cartesian product
(of
> course).
> > > > The
> > > > > > > query
> > > > > > > > >
> > > > > > > > >    select flatten(a, b) from ...
> > > > > > > > >
> > > > > > > > > also doesn't work because flatten doesn't have
a
> > multi-argument
> > > > > form.
> > > > > > > > >
> > > > > > > > > Going crazy, this query kind of sort of almost
works, but
> not
> > > > > really:
> > > > > > > > >
> > > > > > > > >      select r.x.`key`, flatten(r.x.`value`) 
from (
> > > > > > > > >
> > > > > > > > >          select flatten(kvgen(x)) as x from ...)
r;
> > > > > > > > >
> > > > > > > > > What I really want to see is something like this:
> > > > > > > > >    select zip(flatten(a), flatten(b)) from ...
> > > > > > > > >
> > > > > > > > > Any pointers?  Is my next step to write a UDF?
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >  Steven Phillips
> > > >  Software Engineer
> > > >
> > > >  mapr.com
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message