drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mehant Baid" <baid.meh...@gmail.com>
Subject Re: Review Request 15822: Remove _MAP[] from dirll; JIRA# 307
Date Tue, 03 Dec 2013 10:14:40 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Dec. 3, 2013, 10:14 a.m.)

Review request for drill and Jacques Nadeau.


Rebased to the latest master.

Made code changes as per Jacques's review comments. 

Repository: drill-git

Description (updated)

Currently while executing SQL we need to use _MAP['columnname'] in the select statements.

The problem while using select statements with columnnames instead of _MAP['columnname'] is
that Optiq tries to verify that these columns exist in the schema and raises validation errors.

The way we've solved the problem is to override the 'rowtype' returned to Optiq while creating
a Drill table. We have defined a new class DynamicRecordType which is the 'rowtype' returned
to Optiq and will be invoked for any validation purposes. DynamicRecordType always returns
true when queried if a certain field exists in the table, with its type as 'ANY'. This gets
us past the validation in Optiq and allows us to use column names directly in select statements.


Previously we had the notion that the scan operator has a single column called _MAP, which
used to be the reference for the scan operator. So all the actual columns would be referenced
as _MAP.columnname. We have removed reference (ref) in the scan operator (since we don't need
it) so we simply use column name to get the value vector. 

Modified setClearAndSetRoot to be setRoot. Any row read via the reference interpreter's JSON
reader previously needed to be surrounded by a MAP type so that the project expression _MAP.columnname
would be correctly evaluated. Since we don't have the notion of _MAP anymore, I have changed
the plans such that the expression (project) contains only column names, hence we don't need
to surround every row with a MAP type. 

Instead of using Optiq's rowtype, the DrillTable will use the new DynamicRecordType as the
rowtype. We need to use this rowtype, so that when Optiq is performing validation we invoke
functions in this class that will allow validation to pass smoothly. More details about how
this works is in DynamicRecordType.java

We use an instance of this class as the row type for DrillTable. Since we don't know the schema
before hand whenever Optiq requires us to validate that a field exists we always return true
and indicate that the type of that field is 'ANY'. By default we have one field in this table
= '*', type is 'ANY'. This field is basically used during 'select *' type queries. We use
'*' as a special character which essentially means get all columns in the record batch. In
BatchLoaderMap while returning the results, we detect that the requested field is using this
special character ('*') and return all fields in the record batch as opposed to in the normal
case where we return only the requested field.

We no longer need a 'ref' for the scan operator. Removed it.

Modified the special character to be used to detect that all the fields in the record batch
are to be returned as result as opposed to only the requested field from "_MAP" to "*"

The schema for the table is generated from the query, meaning the field list is essentially
all the fields requested. Since different queries can request different fields it does not
make sense to store schema information. In the next iteration it would be useful to store
the schema once we perform a scan and know the actual fields from the data. 

If the ProjectRel contains only a single projection, whose expression is our special character
(= name of default field in DynamicRecordType) then we know that all the fields in the record
batch are to be returned. So we don't add a Project operator in that case. 

Test changes (*.json and test/**/*.java)
All test changes fall into one of the following categories:

* Upper case column names. Optiq upper cases the column names specified in the query, so we
look for upper cased column names while fetching the data. If our column names in the data
isn't upper cased we will return NULL. This wasn't a problem previously since we used to specify
the column name within single quotes (something like _MAP['columnname']) which optiq didn't
upper case.

* Remove the use of 'ref'. Since we don't have a 'ref' for the scan operator we don't need
to use tablename.columnname or _MAP.columnname.

Diffs (updated)

  exec/java-exec/src/main/java/org/apache/drill/exec/store/json/JSONRecordReader.java 49aea22

  exec/ref/src/main/java/org/apache/drill/exec/ref/UnbackedRecord.java 29c83d4 
  exec/ref/src/main/java/org/apache/drill/exec/ref/rse/JSONRecordReader.java f73bee5 
  exec/ref/src/test/java/org/apache/drill/exec/ref/rops/CollapsingAggregateTest.java d63c0cc

  exec/ref/src/test/java/org/apache/drill/exec/ref/rops/OrderROPTest.java d4ea473 
  exec/ref/src/test/java/org/apache/drill/exec/ref/rops/WindowFrameROPTest.java 735c9cb 
  exec/ref/src/test/resources/collapse/test1.json d4b1e24 
  exec/ref/src/test/resources/departments.json 3cf0a85 
  exec/ref/src/test/resources/donuts.json d16ef46 
  exec/ref/src/test/resources/employees.json 83f68bd 
  exec/ref/src/test/resources/join/full_join.json 59a53f3 
  exec/ref/src/test/resources/join/left_join.json 96e6a79 
  exec/ref/src/test/resources/join/simple_join.json 1587a78 
  exec/ref/src/test/resources/order/nulls-first.json 6e541e9 
  exec/ref/src/test/resources/order/nulls-last.json 83338b7 
  exec/ref/src/test/resources/union/distinct.json 9361009 
  exec/ref/src/test/resources/union/nondistinct.json 35ad70e 
  sqlparser/src/main/java/org/apache/drill/jdbc/DrillTable.java f51021e 
  sqlparser/src/main/java/org/apache/drill/jdbc/DynamicRecordType.java PRE-CREATION 
  sqlparser/src/main/java/org/apache/drill/optiq/DrillProjectRel.java fabfb77 
  sqlparser/src/main/java/org/apache/drill/optiq/DrillScan.java a165fea 
  sqlparser/src/main/java/org/apache/drill/sql/client/full/BatchLoaderMap.java d444348 
  sqlparser/src/main/java/org/apache/drill/sql/client/full/FileSystemSchema.java 1650aaf 
  sqlparser/src/test/java/org/apache/drill/jdbc/test/JdbcTest.java 9ae9682 
  sqlparser/src/test/resources/test-models.json 23895c9 

Diff: https://reviews.apache.org/r/15822/diff/



Mehant Baid

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message