spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toby Douglass <>
Subject Re: Shark vs Impala
Date Sun, 22 Jun 2014 16:13:59 GMT
I've just benchmarked Spark and Impala.  Same data (in s3), same query,
same cluster.

Impala has a long load time, since it cannot load directly from s3.  I have
to create a Hive table on s3, then insert from that to an Impala table.
This takes a long time; Spark took about 600s for the query, Impala 250s,
but Impala required 6k seconds to load data from s3.  If you're going to go
the long-initial-load-then-quick-queries route, go for Redshift.  On
equivalent hardware, that took about 4k seconds to load, but then queries
are like 5s each.

View raw message