spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Rai <>
Subject Re: Announcing Spark SQL
Date Sat, 29 Mar 2014 04:53:39 GMT
Thanks Patrick,

I was thinking about that... Upon analysis I realized (on date) it would be
something similar to the way Hive Context using CustomCatalog stuff.
I will review it again, on the lines of implementing SchemaRDD with
Cassandra. Thanks for the pointer.

Upon discussion with couple of our clients, it seems the reason they would
prefer using hive is that they have already invested a lot in it. Mostly in
UDFs and HiveQL.
1. Are there any plans to develop the SQL Parser to handdle more complex
queries like HiveQL? Can we just plugin a custom parser instead of bringing
in the whole hive deps?
2. Is there any way we can support UDFs in Catalyst without using Hive? It
will bee fine if we don't support Hive UDFs as is and need minor porting


*Founder & CEO, **Tuplejump, Inc.*
*The Data Engineering Platform*

On Fri, Mar 28, 2014 at 12:48 AM, Patrick Wendell <>wrote:

> Hey Rohit,
> I think external tables based on Cassandra or other datastores will work
> out-of-the box if you build Catalyst with Hive support.
> Michael may have feelings about this but I'd guess the longer term design
> for having schema support for Cassandra/HBase etc likely wouldn't rely on
> hive external tables because it's an unnecessary layer of indirection.
> Spark should be able to directly load an SchemaRDD from Cassandra by just
> letting the user give relevant information about the Cassandra schema. And
> it should let you write-back to Cassandra by giving a mapping of fields to
> the respective cassandra columns. I think all of this would be fairly easy
> to implement on SchemaRDD and likely will make it into Spark 1.1
> - Patrick
> On Wed, Mar 26, 2014 at 10:59 PM, Rohit Rai <> wrote:
>> Great work guys! Have been looking forward to this . . .
>> In the blog it mentions support for reading from Hbase/Avro... What will
>> be the recommended approach for this? Will it be writing custom wrappers
>> for SQLContext like in HiveContext or using Hive's "EXTERNAL TABLE" support?
>> I ask this because a few days back (based on your pull request in github)
>> I started analyzing what it would take to support Spark SQL on Cassandra.
>> One obvious approach will be to use Hive External Table support with our
>> cassandra-hive handler. But second approach sounds tempting as it will give
>> more fidelity.
>> Regards,
>> Rohit
>> *Founder & CEO, **Tuplejump, Inc.*
>> ____________________________
>> *The Data Engineering Platform*
>> On Thu, Mar 27, 2014 at 9:12 AM, Michael Armbrust <
>> > wrote:
>>> Any plans to make the SQL typesafe using something like Slick (
>>> I would really like to do something like that, and maybe we will in a
>>> couple of months. However, in the near term, I think the top priorities are
>>> going to be performance and stability.
>>> Michael

View raw message