spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <>
Subject Re: Hive on Spark vs. SparkSQL using Hive ?
Date Thu, 29 Jan 2015 16:15:00 GMT
I would characterize the difference as follows:

Spark SQL <>
is the native engine for processing structured data using Spark.  In
contrast to Shark or Hive on Spark is has its own optimizer that was
designed for the RDD model.  It is also the basis of the DataFrame API and
DataSources API.  The former gives you Pandas/R style manipulation of RDDs
and is used by MLlib pipelines.  The latter is the easiest way in Spark to
access data from various sources including parquet, Avro, and JDBC (some of
this coming soon in Spark 1.3!).

While there are currently some omissions
Spark SQL supports a pretty significant chunk of HiveQL including their
UDFs, UDAs, UDTs, SerDes and the ability to talk to a Hive Metastore.

Hive on Spark is the Apache Hive project, where the map/reduce component
has been replaced with Spark.

On Wed, Jan 28, 2015 at 3:24 PM, ogoh <> wrote:

> Hello,
> probably this question was already asked but still I'd like to confirm from
> Spark users.
> This following blog shows 'hive on spark' :
> ".
> How is it different from using hive as data storage of SparkSQL
> (
> Also, is there any update about SparkSQL's next release (current one is
> still alpha)?
> Thanks,
> OGoh
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message