spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <>
Subject Re: Spark v Redshift
Date Tue, 04 Nov 2014 23:53:22 GMT
BTW while I haven't actually used Redshift, I've seen many companies that use both, usually
using Spark for ETL and advanced analytics and Redshift for SQL on the cleaned / summarized
data. Xiangrui Meng also wrote to make it
easy to read data exported from Redshift into Spark or Hadoop.


> On Nov 4, 2014, at 3:51 PM, Matei Zaharia <> wrote:
> Is this about Spark SQL vs Redshift, or Spark in general? Spark in general provides a
broader set of capabilities than Redshift because it has APIs in general-purpose languages
(Java, Scala, Python) and libraries for things like machine learning and graph processing.
For example, you might use Spark to do the ETL that will put data into a database such as
Redshift, or you might pull data out of Redshift into Spark for machine learning. On the other
hand, if *all* you want to do is SQL and you are okay with the set of data formats and features
in Redshift (i.e. you can express everything using its UDFs and you have a way to get data
in), then Redshift is a complete service which will do more management out of the box.
> Matei
>> On Nov 4, 2014, at 3:11 PM, agfung <> wrote:
>> I'm in the midst of a heated debate about the use of Redshift v Spark with a
>> colleague.  We keep trading anecdotes and links back and forth (eg airbnb
>> post from 2013 or amplab benchmarks), and we don't seem to be getting
>> anywhere. 
>> So before we start down the prototype /benchmark road, and in desperation 
>> of finding *some* kind of objective third party perspective,  was wondering
>> if anyone who has used both in 2014 would care to provide commentary about
>> the sweet spot use cases / gotchas for non trivial use (eg a simple filter
>> scan isn't really interesting).  Soft issues like operational maintenance
>> and time spent developing v out of the box are interesting too... 
>> --
>> View this message in context:
>> Sent from the Apache Spark User List mailing list archive at
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message