drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rodrigues <alex.jose.rodrig...@gmail.com>
Subject Re: Custom functions with arbitrary number of arguments + improving physical plan on known cases
Date Mon, 18 Aug 2014 16:22:35 GMT

Thanks for the answers.

Let's imagine the following use case. We define a function called
InsidePolygon, that receives the X and Y of a point and we specify the
x,y's of the edges of a polygon. That would be a use case for var args. We
can alternatively specify the polygon as a string in the second argument,
deserialize it (e.g. split by comma) and use it to define the polygon.

SELECT s.id FROM sometable s
WHERE InsidePolygon(s.x, s.y, '0,0,0,1,1,1,1,0')

Would this function instance be created multiple times, or just once in
each drillbits and then feed with x,y pairs?
Is there any way of efficiently generate the code of a function that checks
for that specific polygon? In this case you be simply convert the function
code to a simple predicate such as:

(s.x >= 0.0 && s.x <= 1.0) && (s.y >= 0.0 && s.y <= 1.0)

Regarding the second question and using the above use case I'm thinking in
using geohashing to limit the sub-set of records to read. Imagine I just
want to query the HBase rows which row_key start by some value in a set
given by a a function.

Imagine we have a function like:

GeoHashByPoly('0,0,0,1,1,1,1,0') that returns a set of geohash.

And we just want to have read ranges that start by the values on that set.
What's the best way of hinting the HBase storage plugin to receive this
information when planning the ranges?

Alexandre Rodrigues

On Mon, Aug 18, 2014 at 4:55 PM, Aditya <adityakishore@gmail.com> wrote:

> >
> > *Question 1* – how to implement a function with arbitrary arguments
> >
> > I am trying to implement a function that accepts an arbitrary number of
> > arguments, like the following:
> >
> >
> > SELECT * FROM somesource ss
> > WHERE foo(ss.a, ss.b, 1, 2, 3, 4)
> >
> > Foo function will return a boolean.
> > How can I have a SimpleFunction accept more than 2 arguments (left and
> > right)?
> >
> ​Currently Drill supports Functions with only fixed number of arguments.
> You can have more than 2 arguments in a function, just that the number is
> fixed when it is defined. We have talked about extending the Drill function
> framework to support var-args but no progress have been made.
> Could you please let us know what functions do you have in mind which could
> use such capability?
> ​
> > *Question 2* - if there's a way to limit the search scope with an
> heuristic
> > (e.g. scan range in a HBase table), how can I hint the runtime or
> 'affect'
> > the physical plan through custom code? Is this possible?
> >
> Drill can ​automatically convert certain WHERE clauses into sub-range scan
> (if the where clause is on 'row_key' column) and/or attach HBase filters to
> the scan. Please see [1] and [2].
> ​[1] https://issues.apache.org/jira/browse/DRILL-571
> [2] https://issues.apache.org/jira/browse/DRILL-783

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message