drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Parquet Predicate Push down not working
Date Thu, 30 Apr 2020 01:10:55 GMT
Hi Navin,

You raise some good questions. I don't have a complete answer, but I can tackle some of the
basics.

Rafael noted that images are blocked on Apache mail lists. I believe you can post images in
the Drill Slack channel. Better, perhaps is to open a JIRA ticket with your images and information
so it is easier for us to track these specific questions & issues.


Drill supports two forms of Parquet predicate push-down. The first is partition pruning, which
removes files based on their directory names. (Let's say you have files in the 2019 and 2020
directories, and have a WHERE clause that limits the query to just the 2020 directory). Partition
pruning should work as long as you explicitly mention the directories:

... WHERE dir0 = "2020"

(Unfortunately, since Drill has no schema, Drill cannot map directories to column names the
way Hive can.)

The simplest, least-fuss way to enable filter push-down is to filter based on directories:
doing so requires no extra schema information be provided to Drill, nor does it require Drill
to do extra work (reading files) when planning a query. Directory pruning works for Parquet
and all other file types as well.


The second form of pruning occurs at the row group level. Here I'll need some help from the
folks that have worked with that code. I'm not sure if the planner will open every file at
plan time to read this information. I do seem to recall that Drill does (did?) gather and
cache the info. There is also a newly-added metadata feature to gather this information once
to avoid per-query scans. Perhaps someone with more current knowledge can fill in the details.

You noted that the filter does not remove records. This is correct. The filter simply tags
records as matching the filter or not. The Selection Vector Remover (SVR) does the actual
removal. The SVR operator is used in other places as well. It is the combination of (Filter
--> SVR) that performs the full filter operation. The (Filter --> SVR) combination will
always run in the same minor fragment, so no extra network I/O occurs.


Another question asked about parallelism. Drill parallelizes based on HDFS file blocks which
are commonly 256 MB or 512 MB. This is classic HDFS "data locality" behavior and is why Rafael
suggests having larger Parquet files. (That said, Drill also parallelizes based on files,
so having many small files should also work, ignoring the classic HDFS "small file problem",
this was a big advantage of the MapR file system, and of S3.)


Your note does suggest another approach, which might work better on "blockless" systems such
as S3 or local disk: parallelize at the row group level. Parquet is complex, we'd have to
understand the costs and benefits of such an approach.

Thanks,
- Paul

 

    On Wednesday, April 29, 2020, 9:35:40 AM PDT, Navin Bhawsar <navin.bhawsar@gmail.com>
wrote:  
 
 Hi  
We are trying to do a simple where clause query with Predicate .Parquet files are created
using python and stored on hdfs.Apache Drill version used is 1.17 .

Below options are set as default required for Predicate Push Down

Drill query is scanning directory with multiple parquet files (total size 1 GB).We are expecting
if predicate push down works it will help reduce scan time which is currently 97 %.If Predicate
push down works row group scan should only fetch 70,840 records instead of 14162187.




| 
Minor Fragment
 | 
NUM_ROWGROUPS
 | 
ROWGROUPS_PRUNED
 | 
NUM_DICT_PAGE_LOADS
 | 
NUM_DATA_PAGE_lOADS
 | 
NUM_DATA_PAGES_DECODED
 | 
NUM_DICT_PAGES_DECOMPRESSED
 | 
NUM_DATA_PAGES_DECOMPRESSED
 | 
TOTAL_DICT_PAGE_READ_BYTES
 | 
TOTAL_DATA_PAGE_READ_BYTES
 | 
TOTAL_DICT_DECOMPRESSED_BYTES
 | 
TOTAL_DATA_DECOMPRESSED_BYTES
 | 
TIME_DICT_PAGE_LOADS
 | 
TIME_DATA_PAGE_LOADS
 | 
TIME_DATA_PAGE_DECODE
 | 
TIME_DICT_PAGE_DECODE
 | 
TIME_DICT_PAGES_DECOMPRESSED
 | 
TIME_DATA_PAGES_DECOMPRESSED
 | 
TIME_DISK_SCAN_WAIT
 | 
TIME_DISK_SCAN
 | 
TIME_FIXEDCOLUMN_READ
 | 
TIME_VARCOLUMN_READ
 | 
TIME_PROCESS
 |
| 
01-00-04
 | 
7
 | 
0
 | 
77
 | 
0
 | 
77
 | 
77
 | 
77
 | 
0
 | 
0
 | 
7,147,852
 | 
8,884,071
 | 
598,070
 | 
0
 | 
97,822
 | 
11,440,739
 | 
2,081,514
 | 
17,694,740
 | 
598,070
 | 
0
 | 
112,108,259
 | 
703,103,096
 | 
815,245,307
 |
| 
01-01-04
 | 
6
 | 
0
 | 
66
 | 
0
 | 
66
 | 
66
 | 
66
 | 
0
 | 
0
 | 
2,115,860
 | 
4,316,153
 | 
1,778,468
 | 
0
 | 
144,320
 | 
3,665,957
 | 
775,403
 | 
8,693,618
 | 
1,778,468
 | 
0
 | 
105,066,657
 | 
776,807,232
 | 
882,070,408
 |
| 
01-02-04
 | 
6
 | 
0
 | 
66
 | 
0
 | 
66
 | 
66
 | 
66
 | 
0
 | 
0
 | 
6,835,560
 | 
8,630,174
 | 
337,404
 | 
0
 | 
100,190
 | 
10,876,145
 | 
1,970,521
 | 
11,789,061
 | 
337,404
 | 
0
 | 
102,833,433
 | 
655,338,696
 | 
758,203,357
 |
| 
01-03-04
 | 
6
 | 
0
 | 
66
 | 
0
 | 
66
 | 
66
 | 
66
 | 
0
 | 
0
 | 
2,242,112
 | 
4,516,183
 | 
1,586,562
 | 
0
 | 
164,398
 | 
3,827,371
 | 
877,814
 | 
8,604,307
 | 
1,586,562
 | 
0
 | 
112,745,628
 | 
758,634,132
 | 
871,586,588
 |
| 
01-04-04
 | 
6
 | 
0
 | 
66
 | 
2
 | 
66
 | 
66
 | 
64
 | 
0
 | 
1,420
 | 
5,407,178
 | 
7,175,446
 | 
2,216,935
 | 
3,181
 | 
74,956
 | 
8,754,425
 | 
1,650,970
 | 
11,241,636
 | 
2,216,935
 | 
0
 | 
97,180,713
 | 
668,249,966
 | 
765,461,684
 |
| 
01-05-04
 | 
6
 | 
0
 | 
66
 | 
1
 | 
66
 | 
66
 | 
65
 | 
0
 | 
92
 | 
1,378,260
 | 
3,595,638
 | 
3,394,196
 | 
1,571
 | 
204,833
 | 
2,726,005
 | 
1,357,297
 | 
6,843,717
 | 
3,394,196
 | 
0
 | 
150,560,569
 | 
704,154,215
 | 
854,928,393
 |
| 
01-06-04
 | 
6
 | 
0
 | 
66
 | 
0
 | 
66
 | 
66
 | 
66
 | 
0
 | 
0
 | 
4,748,302
 | 
6,547,215
 | 
471,679
 | 
0
 | 
114,270
 | 
7,739,335
 | 
1,537,805
 | 
10,571,215
 | 
471,679
 | 
0
 | 
97,392,926
 | 
667,056,499
 | 
764,478,811
 |
| 
01-07-04
 | 
6
 | 
0
 | 
68
 | 
0
 | 
66
 | 
64
 | 
66
 | 
180
 | 
0
 | 
769,746
 | 
3,128,730
 | 
292,603
 | 
0
 | 
130,814
 | 
1,574,574
 | 
425,133
 | 
6,563,457
 | 
286,300
 | 
0
 | 
168,501,325
 | 
716,135,483
 | 
884,850,308
 |
| 
01-08-04
 | 
6
 | 
0
 | 
66
 | 
0
 | 
66
 | 
66
 | 
66
 | 
0
 | 
0
 | 
8,356,637
 | 
9,264,223
 | 
582,946
 | 
0
 | 
101,103
 | 
13,332,669
 | 
2,422,705
 | 
13,340,100
 | 
582,946
 | 
0
 | 
109,932,913
 | 
691,400,457
 | 
801,374,949
 |
| 
01-09-04
 | 
6
 | 
0
 | 
66
 | 
2
 | 
66
 | 
66
 | 
64
 | 
0
 | 
133
 | 
1,453,953
 | 
2,953,546
 | 
19,563,820
 | 
1,920
 | 
149,257
 | 
2,553,666
 | 
632,461
 | 
5,886,238
 | 
19,563,820
 | 
0
 | 
81,854,819
 | 
557,612,832
 | 
639,664,370
 |
| 
01-10-04
 | 
6
 | 
0
 | 
66
 | 
0
 | 
66
 | 
66
 | 
66
 | 
0
 | 
0
 | 
6,634,676
 | 
8,081,684
 |


Please advise if there is any specific options required to enable predicate push down.
Also we expect Filter should filter out records but its done later by SELECTION_VECTOR_REMOVER
operator.There is not enough details on documentation site ,when this operation is triggered.
Thanks,Navin  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message