spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <>
Subject Re: [shark-users] SQL on Spark - Shark or SparkSQL
Date Mon, 31 Mar 2014 02:35:04 GMT
Hi Manoj,

At the current time, for drop-in replacement of Hive, it will be best to stick with Shark.
Over time, Shark will use the Spark SQL backend, but should remain deployable the way it is
today (including launching the SharkServer, using the Hive CLI, etc). Spark SQL is better
for accessing Hive data within a Spark program though, where its APIs are richer and easier
to link to than the SharkContext.sql2rdd we had previously provided in Shark.

So in a nutshell, if you have a Shark deployment today, or need the HiveServer, then going
with Shark will be fine and we will switch out the backend in a future release (we’ll probably
create preview of this even before we’re ready to fully switch). If you just want to run
SQL queries or load SQL data within a Spark program, try out Spark SQL.


On Mar 30, 2014, at 4:46 PM, Mayur Rustagi <> wrote:

> +1 Have done a few installations of Shark with customers using Hive, they love it. Would
be good to maintain compatibility with Metastore & QL till we have substantial reason
to break off (like BlinkDB). 
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> @mayur_rustagi
> On Sun, Mar 30, 2014 at 2:46 AM, Nicholas Chammas <>
> This is a great question. We are in the same position, having not invested in Hive yet
and looking at various options for SQL-on-Hadoop.
> On Sat, Mar 29, 2014 at 9:48 PM, Manoj Samel <> wrote:
> Hi,
> In context of the recent Spark SQL announcement (
> If there is no existing investment in Hive/Shark, would it be worth starting a new SQL
work using SparkSQL rather than Shark ?
> * It seems Shark SQL core will use more and more of SparkSQL
> * From the blog, it seems Shark has baggage from Hive, that is not needed in this case
> On the other hand, there seems to be two shortcomings of SparkSQL (from a quick scan
of blog and doc) 
> * SparkSQL will have less features than Shark/Hive QL, at least for now.
> * The standalone SharkServer feature will not be available in SparkSQL.
> Can someone from Databricks shed light on what is the long term roadmap? It will help
in avoiding investing in older/two technologies for work with no Hive needs.
> Thanks,
> PS: Great work on SparkSQL
> -- 
> You received this message because you are subscribed to the Google Groups "shark-users"
> To unsubscribe from this group and stop receiving emails from it, send an email to
> To post to this group, send email to
> Visit this group at
> For more options, visit

View raw message