spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tin Vu <tvu...@ucr.edu>
Subject Re: [SparkSQL] SparkSQL performance on small TPCDS tables is very low when compared to Drill or Presto
Date Thu, 29 Mar 2018 14:46:47 GMT
 You are right. There are too much tasks was created. How can we reduce the
number of tasks?

On Thu, Mar 29, 2018, 7:44 AM Lalwani, Jayesh <Jayesh.Lalwani@capitalone.com>
wrote:

> Without knowing too many details, I can only guess. It could be that Spark
> is creating a lot of tasks even though there are less records. Creation and
> distribution of tasks has a noticeable overhead on smaller datasets.
>
>
>
> You might want to look at the driver logs, or the Spark Application Detail
> UI.
>
>
>
> *From: *Tin Vu <tvu032@ucr.edu>
> *Date: *Wednesday, March 28, 2018 at 8:04 PM
> *To: *"user@spark.apache.org" <user@spark.apache.org>
> *Subject: *[SparkSQL] SparkSQL performance on small TPCDS tables is very
> low when compared to Drill or Presto
>
>
>
> Hi,
>
>
>
> I am executing a benchmark to compare performance of SparkSQL, Apache
> Drill and Presto. My experimental setup:
>
> ·         TPCDS dataset with scale factor 100 (size 100GB).
>
> ·         Spark, Drill, Presto have a same number of workers: 12.
>
> ·         Each worked has same allocated amount of memory: 4GB.
>
> ·         Data is stored by Hive with ORC format.
>
> I executed a very simple SQL query: "SELECT * from table_name"
> The issue is that for some small size tables (even table with few dozen of
> records), SparkSQL still required about 7-8 seconds to finish, while Drill
> and Presto only needed less than 1 second.
> For other large tables with billions records, SparkSQL performance was
> reasonable when it required 20-30 seconds to scan the whole table.
> Do you have any idea or reasonable explanation for this issue?
>
> Thanks,
>
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Mime
View raw message