I would characterize the difference as follows:

Spark SQL is the native engine for processing structured data using Spark.  In contrast to Shark or Hive on Spark is has its own optimizer that was designed for the RDD model.  It is also the basis of the DataFrame API and DataSources API.  The former gives you Pandas/R style manipulation of RDDs and is used by MLlib pipelines.  The latter is the easiest way in Spark to access data from various sources including parquet, Avro, and JDBC (some of this coming soon in Spark 1.3!).

While there are currently some omissions Spark SQL supports a pretty significant chunk of HiveQL including their UDFs, UDAs, UDTs, SerDes and the ability to talk to a Hive Metastore.

Hive on Spark is the Apache Hive project, where the map/reduce component has been replaced with Spark.  



On Wed, Jan 28, 2015 at 3:24 PM, ogoh <okehee@gmail.com> wrote:

Hello,
probably this question was already asked but still I'd like to confirm from
Spark users.

This following blog shows 'hive on spark' :
http://blog.cloudera.com/blog/2014/12/hands-on-hive-on-spark-in-the-aws-cloud/".
How is it different from using hive as data storage of SparkSQL
(http://spark.apache.org/docs/latest/sql-programming-guide.html)?
Also, is there any update about SparkSQL's next release (current one is
still alpha)?

Thanks,
OGoh




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hive-on-Spark-vs-SparkSQL-using-Hive-tp21412.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org