drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jaimes, Rafael - 0993 - MITLL" <Rafael.Jai...@ll.mit.edu>
Subject RE: Parquet Predicate Push down not working
Date Wed, 29 Apr 2020 18:17:06 GMT
Hi Navin,

 

I don’t think inline screenshots work on the mailing list so they are not showing up for
me. I don’t think you have to do anything in Drill 1.17 to enable predicate pushdown for
Parquet.

 

1 GB total dataset is really small. If that’s spread across multiple parquet files the row
group is going to be tiny and performance will be poor. How many files do you have now?

I would aim for 1-2 GB row groups for best Parquet performance. Maybe 512 MB if the computers
building them have low RAM. 

 

Do all the parquet files have 100% identical schema?

 

Can you post your query?

 

-          Raf

 

 

From: Navin Bhawsar <navin.bhawsar@gmail.com> 
Sent: Wednesday, April 29, 2020 12:35 PM
To: user@drill.apache.org
Cc: arun.ns@gmail.com; Navin Bhawsar <navin.bhawsar@gmail.com>
Subject: Parquet Predicate Push down not working

 

Hi  

 

We are trying to do a simple where clause query with Predicate .Parquet files are created
using python and stored on hdfs.

Apache Drill version used is 1.17 .

 

 

Below options are set as default required for Predicate Push Down



 

Drill query is scanning directory with multiple parquet files (total size 1 GB).

We are expecting if predicate push down works it will help reduce scan time which is currently
97 %.

If Predicate push down works row group scan should only fetch 70,840 records instead of 14162187.



 


Minor Fragment

NUM_ROWGROUPS

ROWGROUPS_PRUNED

NUM_DICT_PAGE_LOADS

NUM_DATA_PAGE_lOADS

NUM_DATA_PAGES_DECODED

NUM_DICT_PAGES_DECOMPRESSED

NUM_DATA_PAGES_DECOMPRESSED

TOTAL_DICT_PAGE_READ_BYTES

TOTAL_DATA_PAGE_READ_BYTES

TOTAL_DICT_DECOMPRESSED_BYTES

TOTAL_DATA_DECOMPRESSED_BYTES

TIME_DICT_PAGE_LOADS

TIME_DATA_PAGE_LOADS

TIME_DATA_PAGE_DECODE

TIME_DICT_PAGE_DECODE

TIME_DICT_PAGES_DECOMPRESSED

TIME_DATA_PAGES_DECOMPRESSED

TIME_DISK_SCAN_WAIT

TIME_DISK_SCAN

TIME_FIXEDCOLUMN_READ

TIME_VARCOLUMN_READ

TIME_PROCESS


01-00-04

7

0

77

0

77

77

77

0

0

7,147,852

8,884,071

598,070

0

97,822

11,440,739

2,081,514

17,694,740

598,070

0

112,108,259

703,103,096

815,245,307


01-01-04

6

0

66

0

66

66

66

0

0

2,115,860

4,316,153

1,778,468

0

144,320

3,665,957

775,403

8,693,618

1,778,468

0

105,066,657

776,807,232

882,070,408


01-02-04

6

0

66

0

66

66

66

0

0

6,835,560

8,630,174

337,404

0

100,190

10,876,145

1,970,521

11,789,061

337,404

0

102,833,433

655,338,696

758,203,357


01-03-04

6

0

66

0

66

66

66

0

0

2,242,112

4,516,183

1,586,562

0

164,398

3,827,371

877,814

8,604,307

1,586,562

0

112,745,628

758,634,132

871,586,588


01-04-04

6

0

66

2

66

66

64

0

1,420

5,407,178

7,175,446

2,216,935

3,181

74,956

8,754,425

1,650,970

11,241,636

2,216,935

0

97,180,713

668,249,966

765,461,684


01-05-04

6

0

66

1

66

66

65

0

92

1,378,260

3,595,638

3,394,196

1,571

204,833

2,726,005

1,357,297

6,843,717

3,394,196

0

150,560,569

704,154,215

854,928,393


01-06-04

6

0

66

0

66

66

66

0

0

4,748,302

6,547,215

471,679

0

114,270

7,739,335

1,537,805

10,571,215

471,679

0

97,392,926

667,056,499

764,478,811


01-07-04

6

0

68

0

66

64

66

180

0

769,746

3,128,730

292,603

0

130,814

1,574,574

425,133

6,563,457

286,300

0

168,501,325

716,135,483

884,850,308


01-08-04

6

0

66

0

66

66

66

0

0

8,356,637

9,264,223

582,946

0

101,103

13,332,669

2,422,705

13,340,100

582,946

0

109,932,913

691,400,457

801,374,949


01-09-04

6

0

66

2

66

66

64

0

133

1,453,953

2,953,546

19,563,820

1,920

149,257

2,553,666

632,461

5,886,238

19,563,820

0

81,854,819

557,612,832

639,664,370


01-10-04

6

0

66

0

66

66

66

0

0

6,634,676

8,081,684

											

 

Please advise if there is any specific options required to enable predicate push down.

 

Also we expect Filter should filter out records but its done later by SELECTION_VECTOR_REMOVER
operator.

There is not enough details on documentation site ,when this operation is triggered.

 

Thanks,

Navin


Mime
View raw message