spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Pirz <james.p...@gmail.com>
Subject Running SparkSql against Hive tables
Date Sat, 06 Jun 2015 01:06:20 GMT
I am pretty new to Spark, and using Spark 1.3.1, I am trying to use 'Spark
SQL' to run some SQL scripts, on the cluster. I realized that for a better
performance, it is a good idea to use Parquet files. I have 2 questions
regarding that:

1) If I wanna use Spark SQL against  *partitioned & bucketed* tables with
Parquet format in Hive, does the provided spark binary on the apache
website support that or do I need to build a new spark binary with some
additional flags ? (I found a note
<https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables>
in
the documentation about enabling Hive support, but I could not fully get it
as what the correct way of building is, if I need to build)

2) Does running Spark SQL against tables in Hive downgrade the performance,
and it is better that I load parquet files directly to HDFS or having Hive
in the picture is harmless ?

Thnx

Mime
View raw message