spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vector" <799203...@qq.com>
Subject Filte the null before InnerJoin to solve the problem of data skew
Date Tue, 08 Dec 2015 13:58:51 GMT
when i join two tables, i find a table has the problem of data skew, and the skewing value
of the field is null. so i want to filte  the null before InnerJoin. like that


a.key is skewed and the skewing value is null


Change


"select * from a join b on a.key = b.key"


to


"select * from a join b on a.key = b.key and a.key is not null"


The idea is feasible ?
Mime
View raw message