spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap
Date Sat, 08 Sep 2018 09:55:18 GMT
Thanks Russ! That helps a lot.

On the other hand makes reviewing the codebase of Spark SQL slightly harder
since Java code generation is so much about string concatenation :(

p.s. Should all the code in doExecute be considered and marked @deprecated?

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


On Fri, Sep 7, 2018 at 10:05 PM Russell Spitzer <russell.spitzer@gmail.com>
wrote:

> That's my understanding :) doExecute is for non-codegen while doProduce
> and Consume are for generating code
>
> On Fri, Sep 7, 2018 at 2:59 PM Jacek Laskowski <jacek@japila.pl> wrote:
>
>> Hi Devs,
>>
>> Sorry for bothering you with my questions (and concerns), but I really
>> need to understand this piece of code (= my personal challenge :))
>>
>> Is this true that SparkPlan.doExecute (to "execute" a physical operator)
>> is only used when whole-stage code gen is disabled (which is not by
>> default)? May I call this execution path traditional (even "old-fashioned")?
>>
>> Is this true that these days SparkPlan.doProduce and SparkPlan.doConsume
>> (and others) are used for "executing" a physical operator (i.e. to generate
>> the Java source code) since whole-stage code generation is enabled and is
>> currently the proper execution path?
>>
>> p.s. This SparkPlan.doExecute is used to trigger whole-stage code gen
>> by WholeStageCodegenExec (and InputAdapter), but that's all the code that
>> is to be executed by doExecute, isn't it?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://about.me/JacekLaskowski
>> Mastering Spark SQL https://bit.ly/mastering-spark-sql
>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Fri, Sep 7, 2018 at 7:24 PM Jacek Laskowski <jacek@japila.pl> wrote:
>>
>>> Hi Spark Devs,
>>>
>>> I really need your help understanding the relationship
>>> between HashAggregateExec, TungstenAggregationIterator and
>>> UnsafeFixedWidthAggregationMap.
>>>
>>> While exploring UnsafeFixedWidthAggregationMap and how it's used I've
>>> noticed that it's for HashAggregateExec and TungstenAggregationIterator
>>> exclusively. And given that TungstenAggregationIterator is used exclusively
>>> in HashAggregateExec and the use of UnsafeFixedWidthAggregationMap in both
>>> seems to be almost the same (if not the same), I've got a question I cannot
>>> seem to answer myself.
>>>
>>> Since HashAggregateExec supports Whole-Stage Codegen
>>> HashAggregateExec.doExecute won't be used at all, but doConsume and
>>> doProduce (unless codegen is disabled). Is that correct?
>>>
>>> If so, TungstenAggregationIterator is not used at all, but
>>> UnsafeFixedWidthAggregationMap is used directly instead (in the Java code
>>> that uses createHashMap or finishAggregate). Is that correct?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://about.me/JacekLaskowski
>>> Mastering Spark SQL https://bit.ly/mastering-spark-sql
>>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>>> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>

Mime
View raw message