drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Would you expect coalesce to return a result when none of the columns in coalesce exist in the parquet file ??
Date Fri, 24 Aug 2018 20:07:10 GMT
Hi Deepak,

Drill is "schema-free" (really, schema-on-read). This means Drill behaves differently for
missing columns than traditional schema-on-write databases. In particular, if a reader does
not find the requested column in the input file, Drill will invent a nullable int column and
fill it with nulls. (Drill is optimistic; it hopes that some of your files has a value for
the column, even if some files do not.)

In your first example, you can see the nullable int in the "NullableIntHolder". Apparently
that type conflicts with an expected type of an untyped null holder. (The untyped null holder
must be new; I hadn't seen that previously.) Not sure where the "UntypedNullHolder" came from.

I would file a JIRA ticket for the first case. The documentation says you can provide any
scalar type to COALESCE. I'd suppose they have to be of the same type, but the documentation
does not say so. Certainly COALESCE should accept an untyped null. The output of your example
should have been a null value of type nullable int.

In the second case, the cast converts the nullable int to a nullable varchar. The output you
see is expected: the NULL value of type nullable int is converted to a null value of type
nullable varchar, and, according to the documentation, "If all expressions evaluate to null,
then the COALESCE function returns null." 
- Paul


    On Thursday, August 23, 2018, 5:14:02 PM PDT, Deepak Kapoor <dkapoor101@gmail.com>
 Would you expect coalesce to return a result when none of the columns in coalesce exist in
the parquet file ??

Query 1:
select coalesce(cert_statusdd,cert_status33) from dfs.`/tmp/parquetfiles`;
Error: SYSTEM ERROR: CompileException: Line 56, Column 27: Assignment conversion not possible
from type “org.apache.drill.exec.expr.holders.NullableIntHolder” to type “org.apache.drill.exec.vector.UntypedNullHolder”

Fragment 1:0

[Error Id: 7b9193fb-289b-4fbf-a52a-2b93b01f0cd0 on dkvm2c:31010] (state=,code=0)

In above case both the columns do not exist.  Is the error reported a valid error or is it
a bug.

Another observation is that if you use CAST on the column that does not exist, Coalesce will
not error out instead return nulls. 
Query 2:
select coalesce(cast(cert_statusdd as varchar),cast(cert_statusee as varchar)) from dfs.`/tmp/parquetfiles`
limit 3;
| EXPR$0  |
| null    |
| null    |
| null    |

Note: Databases like postgresql and mysql will not allow COALESCE on non existing column,
w or without CAST. 
Drill however being schema free, do we expect COALESCE to succeed and give null as result
?? In which case the SYSTEM ERROR: CompileException with the 1st Quert  be a bug. ??

- Deepak (dkapoor@mapr.com <mailto:dkapoor@mapr.com>)
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message