drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeena Vinod <jeena.vi...@oracle.com>
Subject Explain Plan for Parquet data is taking a lot of timre
Date Fri, 24 Feb 2017 00:24:37 GMT
Hi, 

 

Drill is taking 23 minutes for a simple select * query with limit 100 on 1GB uncompressed
parquet data. EXPLAIN PLAN for this query is also taking that long(~23 minutes).

Query: select * from <plugin>.root.`testdata` limit 100;

Query  Plan:

00-00    Screen : rowType = RecordType(ANY *): rowcount = 100.0, cumulative cost = {32810.0
rows, 33110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1429

00-01      Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 100.0, cumulative cost
= {32800.0 rows, 33100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1428

00-02        SelectionVectorRemover : rowType = (DrillRecordRow[*]): rowcount = 100.0, cumulative
cost = {32800.0 rows, 33100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1427

00-03          Limit(fetch=[100]) : rowType = (DrillRecordRow[*]): rowcount = 100.0, cumulative
cost = {32700.0 rows, 33000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1426

00-04            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/testdata/part-r-00000-097f7399-7bfb-4e93-b883-3348655fc658.parquet]],
selectionRoot=/testdata, numFiles=1, usedMetadataFile=true, cacheFileRoot=/testdata, columns=[`*`]]])
: rowType = (DrillRecordRow[*]): rowcount = 32600.0, cumulative cost = {32600.0 rows, 32600.0
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1425

 

I am using Drill1.8 and it is setup on 5 node 32GB cluster and the data is in Oracle Storage
Cloud Service. When I run the same query on 1GB TSV file in this location it is taking only
38 seconds .

Also testdata contains around 2144 .parquet files each around 500KB.

 

Is there any additional configuration required for parquet?

Kindly suggest how to improve the response time here.

 

Regards
Jeena

 

 

 

 

 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message