drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kunal Khatua <kkha...@maprtech.com>
Subject RE: Q about current capabilities
Date Fri, 05 Sep 2014 21:25:01 GMT
In its current avatar, Drill can do this like it does for CSV data sources:

0: jdbc:drill:schema=dfs.m7> select * from nation_mdb100 n, region_mdb100 r
. . . . . . . . . . . . . .> where r.r_regionkey >=2 and r.r_regionkey <= 3
. . . . . . . . . . . . . .> and n.n_regionkey = r.r_regionkey;
+-------------+------------+-------------+------------+-------------+------------+------------+
| n_nationkey |   n_name   | n_regionkey | n_comment  | r_regionkey |
r_name   | r_comment  |
+-------------+------------+-------------+------------+-------------+------------+------------+
| 19          | ROMANIA    | 3           | ular asymptotes are about the
furious multipliers. express dependencies nag above the ironically ironic
account | 3           | EUROPE     | ly final courts cajole furiously final
excuse |
| 22          | RUSSIA     | 3           |  requests against the platelets
use never according to the quickly regular pint | 3           | EUROPE     |
ly final courts cajole furiously final excuse |
| 23          | UNITED KINGDOM | 3           | eans boost carefully special
requests. accounts are. carefull | 3           | EUROPE     | ly final
courts cajole furiously final excuse |
| 6           | FRANCE     | 3           | refully final requests. regular,
ironi | 3           | EUROPE     | ly final courts cajole furiously final
excuse |
| 7           | GERMANY    | 3           | l platelets. regular accounts
x-ray: unusual, regular acco | 3           | EUROPE     | ly final courts
cajole furiously final excuse |
| 12          | JAPAN      | 2           | ously. final, express gifts
cajole a | 2           | ASIA       | ges. thinly even pinto beans ca |
| 18          | CHINA      | 2           | c dependencies. furiously express
notornis sleep slyly regular accounts. ideas sleep. depos | 2           |
ASIA       | ges. thinly even pinto beans ca |
| 21          | VIETNAM    | 2           | hely enticingly express accounts.
even, final  | 2           | ASIA       | ges. thinly even pinto beans ca |
| 8           | INDIA      | 2           | ss excuses cajole slyly across
the packages. deposits print aroun | 2           | ASIA       | ges. thinly
even pinto beans ca |
| 9           | INDONESIA  | 2           |  slyly express asymptotes.
regular deposits haggle slyly. carefully ironic hockey players sleep
blithely. carefull | 2           | ASIA       | ges. thinly even pinto beans
ca |
+-------------+------------+-------------+------------+-------------+------------+------------+
10 rows selected (3.378 seconds)

Like for CSV source, I've used views that cast the String-based keys into
numeric values. The range filters are not pushed all the way down to M7, so
Drill must read all the keys. I believe there is a plan to support filter
push-down...  provided the data being stored in the table is a
byte-representation of a numeric data type and not String (like I have it).

~ Kunal


-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com]
Sent: Thursday, September 04, 2014 10:46 PM
To: drill
Subject: Q about current capabilities

How close is Drill to being able to retrieve do the following?

    select * from primary_table, index_table
    where index_table.key >= limit1 and index_table.key <= limit2
        and primary_table.key = index_table.ref

where both primary_table and index_table are MapR DB tables?

In both tables, the primary key is listed called key and the ref field of
index_table is exactly the key of the primary_table.

I have prototyped this query using Java and the simplest possible
implementation in which I scanned the index_table for values of ref and then
inserted every value of ref into a table of tasks which I executed using a
thread bound worker pool.  Performance was quite acceptable for the desired
application.

This test indicates to me that we wouldn't even need to sort the references
from index_table to be handled nicely by a single thread.  Nor would it even
strictly be necessary to distribute the computation although that would be
fun.

Your thoughts?

Mime
View raw message