drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "benj.dev" <benj....@laposte.net.INVALID>
Subject Problem when using files with differents schemas in the same SELECT
Date Wed, 02 Jan 2019 17:52:31 GMT
Hi,

I have read that in SELECT from multiple sources (SELECT * FROM
tmp.`myfile*`), the files are treated in random order.
But I don't understand why the processing of (parquet) files that do not
have the same columns is not homogeneous.

Example (on Drill 1.14) :

CREATE TABLE tmp2.`mytable1` AS SELECT            1 AS myc1,
      'col3_1' AS myc3;
CREATE TABLE tmp2.`mytable2` AS SELECT            2 AS myc1, 'col2_2' AS
myc2, 'col3_2' AS myc3, 'col4_2' AS myc4;
CREATE TABLE tmp2.`mytable3` AS SELECT 0 AS myc0, 3 AS myc1, 'col2_3' AS
myc2;

SELECT * FROM tmp2.`mytable*`;
| mytable3  | 0           | 3     | col2_3  |
| mytable2  | 1635023213  | 2     | col2_2  |
| mytable1  | 1635023213  | 1     |         |

SELECT myc0 FROM tmp2.`mytable*`;
| 0           |
| 1818386772  |
| 1818386772  |

SELECT myc2 FROM tmp2.`mytable*`;
| col2_3  |
| col2_2  |
|         |

SELECT myc0, myc1, myc2, myc3, myc4 FROM tmp2.`mytable*`;
| 0     | 3     | col2_3  | null    | null    |
| 0     | 2     | col2_2  | col3_2  | col4_2  |
| 0     | 1     |         | col3_1  |         |

Please note that :
- each of these SELECT can sometimes return "Error: SYSTEM ERROR:
NullPointerException".
- The undefined columns may have different value in different calls.
- Another point is that for a given column undefined in some files, this
one can appear with a null value or empty chain (illustrated by the last
example).
  Maybe this is consequent of the (random) order of the SELECT.

I can understand that the processing of different files in the same
request can be difficult, but
- Why try to put (random) value on unknown columns and not just put a
NULL. Put NULL everytime will allow to treat this case
- An error should appears all the time OR never, not randomly.

Does anyone have an explanation or any trick or is it a well-known
comportment/bug with already planned developments ?

Thanks for any explanations or digression,
Regards,

Mime
View raw message