On Thu, Dec 8, 2016 at 11:45 PM, Jinfeng Ni <jni@apache.org> wrote:
> Can you please check the Explain plan output for the original query
> and the query against view, and see if there is any difference in the
> two query plans? The difference might be caused by UNION ALL operator,
> which might lead to different parallelization mode.
Hi,
Here is outputs.
All data in one file.
0: jdbc:drill:zk=local> explain plan for select action['login'],
count(*) from dfs.datastore.events_parquest group by action['login'];
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(EXPR$0=[$0], EXPR$1=[$1])
00-02 UnionExchange
01-01 Project(EXPR$0=[$0], EXPR$1=[$1])
01-02 HashAgg(group=[{0}], EXPR$1=[$SUM0($1)])
01-03 Project(EXPR$0=[$0], EXPR$1=[$1])
01-04 HashToRandomExchange(dist0=[[$0]])
02-01 UnorderedMuxExchange
03-01 Project(EXPR$0=[$0], EXPR$1=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
03-02 HashAgg(group=[{0}], EXPR$1=[COUNT()])
03-03 Project(EXPR$0=[ITEM($0, 'login')])
03-04 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=file:/mnt/data/events_parquest]],
selectionRoot=file:/mnt/data/events_parquest, numFiles=1,
usedMetadataFile=false, columns=[`action`.`login`]]])
Data split in 4 files and combined with UNION ALL
0: jdbc:drill:zk=local> explain plan for select action['login'],
count(*) from dfs.datastore.parquet_synthetic_events_large_partition_all
group by action['login'];
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(EXPR$0=[$0], EXPR$1=[$1])
00-02 UnionExchange
01-01 Project(EXPR$0=[$0], EXPR$1=[$1])
01-02 HashAgg(group=[{0}], EXPR$1=[$SUM0($1)])
01-03 Project(EXPR$0=[$0], EXPR$1=[$1])
01-04 HashToRandomExchange(dist0=[[$0]])
02-01 UnorderedMuxExchange
03-01 Project(EXPR$0=[$0], EXPR$1=[$1],
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($0)])
03-02 HashAgg(group=[{0}], EXPR$1=[COUNT()])
03-03 Project(EXPR$0=[ITEM($2, 'login')])
03-04 UnionAll(all=[true])
03-06 Project(timestamp=[$0],
client_id=[$1], action=[$2])
03-08 UnionAll(all=[true])
03-10 Project(timestamp=[$0],
client_id=[$1], action=[$2])
03-12 UnionAll(all=[true])
03-14 Project(timestamp=[$0],
client_id=[$1], action=[$2])
03-16
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=file:/mnt/data/parquet_synthetic_events_large_partition_0]],
selectionRoot=file:/mnt/data/parquet_synthetic_events_large_partition_0,
numFiles=1, usedMetadataFile=false, columns=[`timestamp`, `client_id`,
`action`]]])
03-13 Project(timestamp=[$0],
client_id=[$1], action=[$2])
03-15
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=file:/mnt/data/parquet_synthetic_events_large_partition_1]],
selectionRoot=file:/mnt/data/parquet_synthetic_events_large_partition_1,
numFiles=1, usedMetadataFile=false, columns=[`timestamp`, `client_id`,
`action`]]])
03-09 Project(timestamp=[$0],
client_id=[$1], action=[$2])
03-11
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=file:/mnt/data/parquet_synthetic_events_large_partition_2]],
selectionRoot=file:/mnt/data/parquet_synthetic_events_large_partition_2,
numFiles=1, usedMetadataFile=false, columns=[`timestamp`, `client_id`,
`action`]]])
03-05 Project(timestamp=[$0],
client_id=[$1], action=[$2])
03-07 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath
[path=file:/mnt/data/parquet_synthetic_events_large_partition_3]],
selectionRoot=file:/mnt/data/parquet_synthetic_events_large_partition_3,
numFiles=1, usedMetadataFile=false, columns=[`timestamp`, `client_id`,
`action`]]])
|