spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toby Douglass <t...@avocet.io>
Subject high minimum query latency
Date Sun, 29 Jun 2014 09:29:03 GMT
Gents,

I've been benchmarking Presto, Spark, Impala and Redshift.

I've been looking most recently at minimum query latency.

In all cases, the cluster consists of eight m1.large EC2 instances.

The miniimal data set is a single 3.5mb gzipped file.

With Presto (backed by s3), I see 1 to 2 second latency.

With Impala (backed by HDFS, as Impala does not support s3) I see about 1
second latency.

With Spark, I see about 9 seconds latency.

Thoughts?

Mime
View raw message