spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap
Date Fri, 07 Sep 2018 19:58:45 GMT
Hi Devs,

Sorry for bothering you with my questions (and concerns), but I really need
to understand this piece of code (= my personal challenge :))

Is this true that SparkPlan.doExecute (to "execute" a physical operator) is
only used when whole-stage code gen is disabled (which is not by default)?
May I call this execution path traditional (even "old-fashioned")?

Is this true that these days SparkPlan.doProduce and SparkPlan.doConsume
(and others) are used for "executing" a physical operator (i.e. to generate
the Java source code) since whole-stage code generation is enabled and is
currently the proper execution path?

p.s. This SparkPlan.doExecute is used to trigger whole-stage code gen
by WholeStageCodegenExec (and InputAdapter), but that's all the code that
is to be executed by doExecute, isn't it?

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


On Fri, Sep 7, 2018 at 7:24 PM Jacek Laskowski <jacek@japila.pl> wrote:

> Hi Spark Devs,
>
> I really need your help understanding the relationship
> between HashAggregateExec, TungstenAggregationIterator and
> UnsafeFixedWidthAggregationMap.
>
> While exploring UnsafeFixedWidthAggregationMap and how it's used I've
> noticed that it's for HashAggregateExec and TungstenAggregationIterator
> exclusively. And given that TungstenAggregationIterator is used exclusively
> in HashAggregateExec and the use of UnsafeFixedWidthAggregationMap in both
> seems to be almost the same (if not the same), I've got a question I cannot
> seem to answer myself.
>
> Since HashAggregateExec supports Whole-Stage Codegen
> HashAggregateExec.doExecute won't be used at all, but doConsume and
> doProduce (unless codegen is disabled). Is that correct?
>
> If so, TungstenAggregationIterator is not used at all, but
> UnsafeFixedWidthAggregationMap is used directly instead (in the Java code
> that uses createHashMap or finishAggregate). Is that correct?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> Follow me at https://twitter.com/jaceklaskowski
>

Mime
View raw message