spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bobby Evans <bo...@apache.org>
Subject Re: Strange WholeStageCodegen UI values
Date Thu, 09 Jul 2020 21:14:50 GMT
Sadly there isn't a lot you can do to fix this.  All of the operations take
iterators of rows as input and produce iterators of rows as output.  For
efficiency reasons, the timing is not done for each individual row. If we
did that in many cases it would take longer to measure how long something
took then it would to just do the operation. So most operators actually end
up measuring the lifetime of the operator which often is the time of the
entire task minus how long it took for the first task to get to that
operator. This is also true of WholeStageCodeGen.

On Thu, Jul 9, 2020 at 11:55 AM Michal Sankot
<michal.sankot@spreaker.com.invalid> wrote:

> Hi,
> I'm checking execution of SQL queries in Spark UI, trying to find a
> bottleneck and values that are displayed in WholeStageCodegen blocks are
> confusing.
>
> In attached example whole query took 6.6 minutes and upper left
> WholeStageCodegen block says that median value is 7.8 minutes and
> maximum 7.27h :O
>
> What does it mean? Do those number have any real meaning? Is there a way
> to find out how long individual blocks really took?
>
> Thanks,
> Michal
> Spark 2.4.4 on AWS EMR
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Mime
View raw message