drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rahul challapalli <challapallira...@gmail.com>
Subject No of files created by CTAS auto partition feature
Date Wed, 26 Aug 2015 19:21:09 GMT
Drillers,

I executed the below query on TPCH SF100 with drill and it took ~2hrs to
complete on a 2 node cluster.

alter session set `planner.width.max_per_node` = 4;
alter session set `planner.memory.max_query_memory_per_node` = 8147483648;
create table lineitem partition by (l_shipdate, l_receiptdate) as select *
from dfs.`/drill/testdata/tpch100/lineitem`;

The below query returned 75780, so I expected drill to create the same no
of files or may be a little more. But drill created so many files that a
"hadoop fs -count" command failed with a "GC overhead limit exceeded". (I
did not change the default parquet block size)

select count(*) from (select l_shipdate, l_receiptdate from
dfs.`/drill/testdata/tpch100/lineitem` group by l_shipdate, l_receiptdate)
sub;
+---------+
| EXPR$0  |
+---------+
| 75780   |
+---------+


Any thoughts on why drill is creating so many files?

- Rahul

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message