spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From The Watcher <watche...@gmail.com>
Subject Hive SKEWED feature supported in Spark SQL ?
Date Thu, 19 Feb 2015 10:25:38 GMT
I have done some testing of inserting into tables defined in Hive using 1.2
and I can see that the PARTITION clause is honored : data files get created
in multiple subdirectories correctly.

I tried the SKEWED BY ON STORED AS DIRECTORIES clause on the CREATE TABLE
clause but I didn't see subdirectories being created in that case.

1) is SKEWED BY honored ? If so, has anyone run into directories not being
created ?

2) if it is not honored, does it matter ? Hive introduced this feature to
better handle joins where tables had a skewed distribution on keys joined
on so that the single mapper handling one of the keys didn't hold up the
whole process. Could that happen in Spark / Spark SQL ?

Thanks

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message