Hi,
I have read that in SELECT from multiple sources (SELECT * FROM
tmp.`myfile*`), the files are treated in random order.
But I don't understand why the processing of (parquet) files that do not
have the same columns is not homogeneous.
Example (on Drill 1.14) :
CREATE TABLE tmp2.`mytable1` AS SELECT 1 AS myc1,
'col3_1' AS myc3;
CREATE TABLE tmp2.`mytable2` AS SELECT 2 AS myc1, 'col2_2' AS
myc2, 'col3_2' AS myc3, 'col4_2' AS myc4;
CREATE TABLE tmp2.`mytable3` AS SELECT 0 AS myc0, 3 AS myc1, 'col2_3' AS
myc2;
SELECT * FROM tmp2.`mytable*`;
| mytable3 | 0 | 3 | col2_3 |
| mytable2 | 1635023213 | 2 | col2_2 |
| mytable1 | 1635023213 | 1 | |
SELECT myc0 FROM tmp2.`mytable*`;
| 0 |
| 1818386772 |
| 1818386772 |
SELECT myc2 FROM tmp2.`mytable*`;
| col2_3 |
| col2_2 |
| |
SELECT myc0, myc1, myc2, myc3, myc4 FROM tmp2.`mytable*`;
| 0 | 3 | col2_3 | null | null |
| 0 | 2 | col2_2 | col3_2 | col4_2 |
| 0 | 1 | | col3_1 | |
Please note that :
- each of these SELECT can sometimes return "Error: SYSTEM ERROR:
NullPointerException".
- The undefined columns may have different value in different calls.
- Another point is that for a given column undefined in some files, this
one can appear with a null value or empty chain (illustrated by the last
example).
Maybe this is consequent of the (random) order of the SELECT.
I can understand that the processing of different files in the same
request can be difficult, but
- Why try to put (random) value on unknown columns and not just put a
NULL. Put NULL everytime will allow to treat this case
- An error should appears all the time OR never, not randomly.
Does anyone have an explanation or any trick or is it a well-known
comportment/bug with already planned developments ?
Thanks for any explanations or digression,
Regards,
|