spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From The Watcher <watche...@gmail.com>
Subject Spark SQL, Hive & Parquet data types
Date Thu, 19 Feb 2015 22:50:27 GMT
Still trying to get my head around Spark SQL & Hive.

1) Let's assume I *only* use Spark SQL to create and insert data into HIVE
tables, declared in a Hive meta-store.

Does it matter at all if Hive supports the data types I need with Parquet,
or is all that matters what Catalyst & spark's parquet relation support ?

Case in point : timestamps & Parquet
* Parquet now supports them as per
https://github.com/Parquet/parquet-mr/issues/218
* Hive only supports them in 0.14
So would I be able to read/write timestamps natively in Spark 1.2 ? Spark
1.3 ?

I have found this thread
http://apache-spark-user-list.1001560.n3.nabble.com/timestamp-not-implemented-yet-td15414.html
which seems to indicate that the data types supported by Hive would matter
to Spark SQL.
If so, why is that ? Doesn't the read path go through Spark SQL to read the
parquet file ?

2) Is there planned support for Hive 0.14 ?

Thanks

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message