drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andries Engelbrecht <aengelbre...@maprtech.com>
Subject Re: Creating a single parquet or csv file using CTAS command?
Date Thu, 04 Feb 2016 16:15:02 GMT
Is there a reason to create a single file? Typically you may want more files to improve parallel
operation on distributed systems like drill.

That said, if you have a single node drill cluster (or embedded mode) you can reduce the threads
to a single thread and increase the parquet file size for the data set size. Be prepared for
things to slow down substantially when doing CTAS and querying the file.

alter session set `planner.width.max_per_node` = 1

Then set parquet file size as large as needed. Not sure on limitations, but do note that sufficient
memory is required to support it as well.

alter session set `store.parquet.block-size` = <size in bytes>

Perhaps someone else knows of a different way to do it. However consider the implications
of creating a single file.

--Andries

> On Feb 4, 2016, at 6:38 AM, Peder Jakobsen | gmail <pjakobsen@gmail.com> wrote:
> 
> Hi, is there a way to force drill to create a single file when performing a
> CTAS command (or some other method).
> 
> Right now, I'm creating CSV files, and then have to perform and extra step
> to stitch 1_0_0.parquet  1_1_0.parquet  1_2_0.parquet etc.  together into a
> single file.
> 
> Thank you.
> 
> Peder


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message