spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Hakobian <nicholas.hakob...@rallyhealth.com>
Subject Re: What is the difference between hive on spark and spark on hive?
Date Mon, 09 Jan 2017 17:50:31 GMT
Hive on Spark is Hive which takes sql statements in and creates Spark jobs
for processing instead of Mapreduce or Tez.

There is no such thing as "Spark on Hive", but there is SparkSQL. SparkSQL
can accept both programmatic statements or it can parse SQL statements to
produce a native Spark DataFrame. It does provide connectivity to the Hive
metastore, and in Spark 1.6 does call into Hive to provide functionality
that doesn't yet exist natively in Spark. I'm not sure how much of that
still exists in Spark 2.0, but I think much of it has been converted into
native Spark functions.

There is also the SparkSQL shell and thrift server which provides a SQL
only interface, but uses all the native Spark pipeline.

Hope this helps!
-Nick

Nicholas Szandor Hakobian, Ph.D.
Senior Data Scientist
Rally Health
nicholas.hakobian@rallyhealth.com

On Mon, Jan 9, 2017 at 7:05 AM, 李斌松 <libinsong1204@gmail.com> wrote:

> What is the difference between hive on spark and spark on hive?
>

Mime
View raw message