spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toby Douglass <>
Subject Re: Shark vs Impala
Date Mon, 23 Jun 2014 12:29:28 GMT
On Sun, Jun 22, 2014 at 5:53 PM, Debasish Das <>

> 600s for Spark vs 5s for Redshift...The numbers look much different from
> the amplab benchmark...
> Is it like SSDs or something that's helping redshift or the whole data is
> in memory when you run the query ? Could you publish the query ?

I think we'll blog it when it's done.  Still working on it.  This was done
with HD nodes, not SSD.

The query is very simple;

select id, count(*) from data_table group by id;

This is on 52.13 GB of gzipped data, with about 150 distinct IDs.

View raw message