drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Gozhiy <anton5...@gmail.com>
Subject Re: drill parquet - create table as ... partition by ... non present column
Date Thu, 06 Dec 2018 15:42:10 GMT
Hi Benj,

Creating partitions as in your first example won't work.
>From the docs: "During partitioning, Drill creates separate files, but not
separate directories, for different partitions." (
https://drill.apache.org/docs/how-to-partition-data/).
Also, Drill doesn't write additional metadata regarding partitioning, when
it reads parquet files it determines partitions using min/max values.
That means that if you want for example to partition using the first
letter, you'll need to create a corresponding column. Or you can create
partitions manually as directories.

On Wed, Dec 5, 2018 at 10:07 PM <benj.dev@laposte.net.invalid> wrote:

> In would like to create a parquet with a partition on computed data
> (without to have to put the result of the computation in the parquet) :
> The goal is to optimize the parquet for typical expecting queries.
>
> Imaginary example :
> CREATE TABLE `mytable`
> PARTITION BY (substr(name,1,1)) AS
> SELECT name, birthdate, birthcity
> ORDER BY bithdate;
>
> So, if I do that I obtain a VALIDATION ERROR: Partition column ... is not
> in the SELECT list of CTAS
>
> And the comment of the code of the function
> "public static RelNode qualifyPartitionCol(RelNode input, List<String>
> partitionColumns)"
> confirms that it's not possible actually :
> " A partition column is resolved, either (1) the same column appear in the
> select list of CTAS or (2) CTAS has a * in select list"
>
> But what is the reason of this limitation ?
> Is there exists any tricks to do it right now, or can we expect an
> evolution to allow this possibilities.
>
> I just imagine to do (with the data of the example)
> CREATE TABLE `mytable`
> PARTITION BY (sname) AS
> SELECT substr(name,1,1) sname, name, birthdate, birthcity
> ORDER BY bithdate;
> Then, next, request each partition file to remove the useless data
> , like
> CREATE TABLE `mytable_2/partition_x`
> SELECT name, birthdate, birthcity
> ORDER BY bithdate;
> but it's not really satisfying...
>
> I would appreciate yours comments,
> Regards,
>
> benj
>


-- 
Sincerely, Anton Gozhiy
anton5813@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message