drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data
Date Sat, 21 Nov 2015 19:06:11 GMT
Aman Sinha created DRILL-4119:
---------------------------------

             Summary: Skew in hash distribution for varchar (and possibly other) types of
data
                 Key: DRILL-4119
                 URL: https://issues.apache.org/jira/browse/DRILL-4119
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
    Affects Versions: 1.3.0
            Reporter: Aman Sinha
            Assignee: Aman Sinha


We are seeing substantial skew for an Id column that contains varchar data of length 32. 
 It is easily reproducible by a group-by query: 
{noformat}
Explain plan for SELECT SomeId From table GROUP BY SomeId;
...
01-02          HashAgg(group=[{0}])
01-03            Project(SomeId=[$0])
01-04              HashToRandomExchange(dist0=[[$0]])
02-01                UnorderedMuxExchange
03-01                  Project(SomeId=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
03-02                    HashAgg(group=[{0}])
03-03                      Project(SomeId=[$0])
{noformat}

The string id happens to be of the following type: 
{noformat}
e4b4388e8865819126cb0e4dcaa7261d
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message