drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piotr Sokólski <pietraswit...@gmail.com>
Subject Estimating cost for fs tables, table names after joins and executing logical plans
Date Sat, 23 May 2015 15:13:49 GMT
Hi, I’ve been playing a bit with v1.0.0 and stumbled upon a few questions/issues:  

1. For query cost estimation one usually needs some additional information about a table such
as the number of rows. Is the cost estimation implemented for fs sources as well? If yes,
how is the metadata extracted and cached? From my understanding some formats like parquet
store it in the file footer, but what about json or csv files? Can this information be queried/retrieved
somehow by the user?

2. I’ve been working with the following query:  

$q = select * from region join nation on region.R_REGIONKEY = nation.N_REGIONKEY;

where region and nation are the sample data files imported into a dfs.tmp schema.
running queries like  

select R_REGIONKEY from ($q);

results in an error "Column 'R_REGIONKEY' is ambiguous”. However queries like select R_REGIONKEY
from (SELECT * FROM region); work fine, as well as saving the result of the join with CREATE
TABLE and then replacing $q with the saved table’s name. Why is that and what are the rules
for renaming columns in join queries?

3. I’ve been trying to execute a logical plan using the web interface. It works fine with
a simple scan - project query, but when trying to use the output of EXPLAIN … FOR $q (with
resultMode changed to “EXEC”) it throws the following error:

SYSTEM ERROR: java.lang.IllegalArgumentException: Conflicting property-based creators: already
had [constructor for org.apache.drill.common.logical.data.Join,  ...

the whole logical query and full error message are at https://gist.github.com/pyetras/bf625b6697de62284996

4. What are the supported conditions for joins? The sql interface seems to support only (e1
== e2 [AND])*, but the logical operator reference at https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/mobilebasic?pli=1#cmnt7
mentions other relations and also cartesian joins. Are those simply not implemented for the
sql parser or not supported in Drill at all?

Sorry for the long read and thanks for your assistance,  

Piotr Sokólski

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message