correction: 4 (b) query should have a non-equality condition instead of
equality:
select n1.n_nationkey from nation n1, (select min(n_regionkey) as
min_regionkey from nation n) n2
where n1.n_regionkey < n2.min_regionkey;
On Sat, May 23, 2015 at 10:28 AM, Aman Sinha <asinha@maprtech.com> wrote:
> 1. Yes, certainly the cost estimation is applied for file system sources
> as well. For Parquet the row count is
> extracted from the metadata available in the files. For text (csv,
> json), this is estimated based on the size
> of the files and a configurable
> *store.text.estimated_row_size_bytes* parameter. The default value is
> 100
> bytes per row, but you can change it.
>
> 2. There are 2 things in your query that need special handling in Drill
> because it does not rely on a centralized
> schema.
> First, the presence of '*' wildcard in the SELECT list - the '*' is
> expanded at run-time unlike the schema-based
> systems where it is expanded at the initial query planning time.
> Second, whenever there is a join between 2
> or more tables, the columns in the query need to be qualified by the
> table name because a column could
> potentially be present in either of the tables.
> The following query which explicitly specifies the qualified columns
> in SELECT clause should work:
> select r_regionkey from (
> select r.r_regionkey, n.n_regionkey from region r join nation n
> on r.r_regionkey = n.n_regionkey)
>
> 3. You might want to file a JIRA with the exact reproducible test case.
> 4. I believe the link you sent for the plan document is quite old. The
> best place for the latest documentation is
> drill.apache.org .
> Non-equality joins are supported in 2 situations:
> a) if there is another equality join between the same 2 tables.
> e.g
> select n1.n_nationkey from nation n1, nation n2
> where n1.n_nationkey = n2.n_nationkey and n1.n_regionkey <
> n2.n_regionkey
> b) if one side of the join is a scalar subquery:
> select n1.n_nationkey from nation n1, (select min(n_regionkey)
> as min_regionkey from nation n) n2
> where n1.n_regionkey = n2.min_regionkey;
>
> Pure cartesian joins are currently not supported.
>
> Aman
>
>
> On Sat, May 23, 2015 at 8:13 AM, Piotr Sokólski <pietraswithin@gmail.com>
> wrote:
>
>> Hi, I’ve been playing a bit with v1.0.0 and stumbled upon a few
>> questions/issues:
>>
>> 1. For query cost estimation one usually needs some additional
>> information about a table such as the number of rows. Is the cost
>> estimation implemented for fs sources as well? If yes, how is the metadata
>> extracted and cached? From my understanding some formats like parquet store
>> it in the file footer, but what about json or csv files? Can this
>> information be queried/retrieved somehow by the user?
>>
>> 2. I’ve been working with the following query:
>>
>> $q = select * from region join nation on region.R_REGIONKEY =
>> nation.N_REGIONKEY;
>>
>> where region and nation are the sample data files imported into a dfs.tmp
>> schema.
>> running queries like
>>
>> select R_REGIONKEY from ($q);
>>
>> results in an error "Column 'R_REGIONKEY' is ambiguous”. However queries
>> like select R_REGIONKEY from (SELECT * FROM region); work fine, as well as
>> saving the result of the join with CREATE TABLE and then replacing $q with
>> the saved table’s name. Why is that and what are the rules for renaming
>> columns in join queries?
>>
>> 3. I’ve been trying to execute a logical plan using the web interface. It
>> works fine with a simple scan - project query, but when trying to use the
>> output of EXPLAIN … FOR $q (with resultMode changed to “EXEC”) it throws
>> the following error:
>>
>> SYSTEM ERROR: java.lang.IllegalArgumentException: Conflicting
>> property-based creators: already had [constructor for
>> org.apache.drill.common.logical.data.Join, ...
>>
>> the whole logical query and full error message are at
>> https://gist.github.com/pyetras/bf625b6697de62284996
>>
>> 4. What are the supported conditions for joins? The sql interface seems
>> to support only (e1 == e2 [AND])*, but the logical operator reference at
>> https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/mobilebasic?pli=1#cmnt7
>> mentions other relations and also cartesian joins. Are those simply not
>> implemented for the sql parser or not supported in Drill at all?
>>
>> Sorry for the long read and thanks for your assistance,
>>
>> --
>> Piotr Sokólski
>>
>>
>
|