In its current avatar, Drill can do this like it does for CSV data sources:
0: jdbc:drill:schema=dfs.m7> select * from nation_mdb100 n, region_mdb100 r
. . . . . . . . . . . . . .> where r.r_regionkey >=2 and r.r_regionkey <= 3
. . . . . . . . . . . . . .> and n.n_regionkey = r.r_regionkey;
+-------------+------------+-------------+------------+-------------+------------+------------+
| n_nationkey | n_name | n_regionkey | n_comment | r_regionkey |
r_name | r_comment |
+-------------+------------+-------------+------------+-------------+------------+------------+
| 19 | ROMANIA | 3 | ular asymptotes are about the
furious multipliers. express dependencies nag above the ironically ironic
account | 3 | EUROPE | ly final courts cajole furiously final
excuse |
| 22 | RUSSIA | 3 | requests against the platelets
use never according to the quickly regular pint | 3 | EUROPE |
ly final courts cajole furiously final excuse |
| 23 | UNITED KINGDOM | 3 | eans boost carefully special
requests. accounts are. carefull | 3 | EUROPE | ly final
courts cajole furiously final excuse |
| 6 | FRANCE | 3 | refully final requests. regular,
ironi | 3 | EUROPE | ly final courts cajole furiously final
excuse |
| 7 | GERMANY | 3 | l platelets. regular accounts
x-ray: unusual, regular acco | 3 | EUROPE | ly final courts
cajole furiously final excuse |
| 12 | JAPAN | 2 | ously. final, express gifts
cajole a | 2 | ASIA | ges. thinly even pinto beans ca |
| 18 | CHINA | 2 | c dependencies. furiously express
notornis sleep slyly regular accounts. ideas sleep. depos | 2 |
ASIA | ges. thinly even pinto beans ca |
| 21 | VIETNAM | 2 | hely enticingly express accounts.
even, final | 2 | ASIA | ges. thinly even pinto beans ca |
| 8 | INDIA | 2 | ss excuses cajole slyly across
the packages. deposits print aroun | 2 | ASIA | ges. thinly
even pinto beans ca |
| 9 | INDONESIA | 2 | slyly express asymptotes.
regular deposits haggle slyly. carefully ironic hockey players sleep
blithely. carefull | 2 | ASIA | ges. thinly even pinto beans
ca |
+-------------+------------+-------------+------------+-------------+------------+------------+
10 rows selected (3.378 seconds)
Like for CSV source, I've used views that cast the String-based keys into
numeric values. The range filters are not pushed all the way down to M7, so
Drill must read all the keys. I believe there is a plan to support filter
push-down... provided the data being stored in the table is a
byte-representation of a numeric data type and not String (like I have it).
~ Kunal
-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com]
Sent: Thursday, September 04, 2014 10:46 PM
To: drill
Subject: Q about current capabilities
How close is Drill to being able to retrieve do the following?
select * from primary_table, index_table
where index_table.key >= limit1 and index_table.key <= limit2
and primary_table.key = index_table.ref
where both primary_table and index_table are MapR DB tables?
In both tables, the primary key is listed called key and the ref field of
index_table is exactly the key of the primary_table.
I have prototyped this query using Java and the simplest possible
implementation in which I scanned the index_table for values of ref and then
inserted every value of ref into a table of tasks which I executed using a
thread bound worker pool. Performance was quite acceptable for the desired
application.
This test indicates to me that we wouldn't even need to sort the references
from index_table to be handled nicely by a single thread. Nor would it even
strictly be necessary to distribute the computation although that would be
fun.
Your thoughts?
|