spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Running SparkSql against Hive tables
Date Sun, 07 Jun 2015 13:39:15 GMT


On 6/6/15 9:06 AM, James Pirz wrote:
> I am pretty new to Spark, and using Spark 1.3.1, I am trying to use 
> 'Spark SQL' to run some SQL scripts, on the cluster. I realized that 
> for a better performance, it is a good idea to use Parquet files. I 
> have 2 questions regarding that:
>
> 1) If I wanna use Spark SQL against  *partitioned & bucketed* tables 
> with Parquet format in Hive, does the provided spark binary on the 
> apache website support that or do I need to build a new spark binary 
> with some additional flags ? (I found a note 
> <https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables> in

> the documentation about enabling Hive support, but I could not fully 
> get it as what the correct way of building is, if I need to build)
Yes, Hive support is enabled by default now for the binaries on the 
website. However, currently Spark SQL doesn't support buckets yet.
>
> 2) Does running Spark SQL against tables in Hive downgrade the 
> performance, and it is better that I load parquet files directly to 
> HDFS or having Hive in the picture is harmless ?
If you're using Parquet, then it should be fine since by default Spark 
SQL uses its own native Parquet support to read Parquet Hive tables.
>
> Thnx
>


Mime
View raw message