Aman Sinha created DRILL-4119:
---------------------------------
Summary: Skew in hash distribution for varchar (and possibly other) types of
data
Key: DRILL-4119
URL: https://issues.apache.org/jira/browse/DRILL-4119
Project: Apache Drill
Issue Type: Bug
Components: Functions - Drill
Affects Versions: 1.3.0
Reporter: Aman Sinha
Assignee: Aman Sinha
We are seeing substantial skew for an Id column that contains varchar data of length 32.
It is easily reproducible by a group-by query:
{noformat}
Explain plan for SELECT SomeId From table GROUP BY SomeId;
...
01-02 HashAgg(group=[{0}])
01-03 Project(SomeId=[$0])
01-04 HashToRandomExchange(dist0=[[$0]])
02-01 UnorderedMuxExchange
03-01 Project(SomeId=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
03-02 HashAgg(group=[{0}])
03-03 Project(SomeId=[$0])
{noformat}
The string id happens to be of the following type:
{noformat}
e4b4388e8865819126cb0e4dcaa7261d
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|