spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Need help with HashAggregateExec, TungstenAggregationIterator and UnsafeFixedWidthAggregationMap
Date Sat, 08 Sep 2018 20:43:25 GMT
Hi Herman,

Right. No @deprecated, but something that would tell people who review the
code "be extra careful since you're reading code that is no longer in use"
for SparkPlans that do support WSCG. That would help a lot as I got tricked
few times already while trying to understand something that I should not
have been bothered much with.

Thanks Russ and Herman for your help to get my thinking right. That will
also help my Spark clients, esp. during Spark SQL workshops!

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


On Sat, Sep 8, 2018 at 3:53 PM Herman van Hovell <herman@databricks.com>
wrote:

> ...pressed send to early...
>
> Moreover the we can't always use whole stage code generation. In that case
> we fall back to vulcano style execution, and chain together doExecute()
> calls.
>
> On Sat, Sep 8, 2018 at 3:51 PM Herman van Hovell <herman@databricks.com>
> wrote:
>
>> SparkPlan.doExecute() is the only way you can execute a physical SQL
>> plan, so it should *not* be marked as deprecated. Wholestage code
>> generation collapses a subtree of SparkPlans (that support whole stage
>> codegeneration) into a single WholeStageCodegenExec pyhsical plan.
>> During execution we call doExecute() on the WholeStageCodegenExec node.
>>
>> On Sat, Sep 8, 2018 at 11:55 AM Jacek Laskowski <jacek@japila.pl> wrote:
>>
>>> Thanks Russ! That helps a lot.
>>>
>>> On the other hand makes reviewing the codebase of Spark SQL slightly
>>> harder since Java code generation is so much about string concatenation :(
>>>
>>> p.s. Should all the code in doExecute be considered and marked
>>> @deprecated?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://about.me/JacekLaskowski
>>> Mastering Spark SQL https://bit.ly/mastering-spark-sql
>>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>>> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>> On Fri, Sep 7, 2018 at 10:05 PM Russell Spitzer <
>>> russell.spitzer@gmail.com> wrote:
>>>
>>>> That's my understanding :) doExecute is for non-codegen while doProduce
>>>> and Consume are for generating code
>>>>
>>>> On Fri, Sep 7, 2018 at 2:59 PM Jacek Laskowski <jacek@japila.pl> wrote:
>>>>
>>>>> Hi Devs,
>>>>>
>>>>> Sorry for bothering you with my questions (and concerns), but I really
>>>>> need to understand this piece of code (= my personal challenge :))
>>>>>
>>>>> Is this true that SparkPlan.doExecute (to "execute" a physical
>>>>> operator) is only used when whole-stage code gen is disabled (which is
not
>>>>> by default)? May I call this execution path traditional (even
>>>>> "old-fashioned")?
>>>>>
>>>>> Is this true that these days SparkPlan.doProduce and
>>>>> SparkPlan.doConsume (and others) are used for "executing" a physical
>>>>> operator (i.e. to generate the Java source code) since whole-stage code
>>>>> generation is enabled and is currently the proper execution path?
>>>>>
>>>>> p.s. This SparkPlan.doExecute is used to trigger whole-stage code gen
>>>>> by WholeStageCodegenExec (and InputAdapter), but that's all the code
that
>>>>> is to be executed by doExecute, isn't it?
>>>>>
>>>>> Pozdrawiam,
>>>>> Jacek Laskowski
>>>>> ----
>>>>> https://about.me/JacekLaskowski
>>>>> Mastering Spark SQL https://bit.ly/mastering-spark-sql
>>>>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>>>>> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
>>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>>
>>>>>
>>>>> On Fri, Sep 7, 2018 at 7:24 PM Jacek Laskowski <jacek@japila.pl>
>>>>> wrote:
>>>>>
>>>>>> Hi Spark Devs,
>>>>>>
>>>>>> I really need your help understanding the relationship
>>>>>> between HashAggregateExec, TungstenAggregationIterator and
>>>>>> UnsafeFixedWidthAggregationMap.
>>>>>>
>>>>>> While exploring UnsafeFixedWidthAggregationMap and how it's used
I've
>>>>>> noticed that it's for HashAggregateExec and TungstenAggregationIterator
>>>>>> exclusively. And given that TungstenAggregationIterator is used exclusively
>>>>>> in HashAggregateExec and the use of UnsafeFixedWidthAggregationMap
in both
>>>>>> seems to be almost the same (if not the same), I've got a question
I cannot
>>>>>> seem to answer myself.
>>>>>>
>>>>>> Since HashAggregateExec supports Whole-Stage Codegen
>>>>>> HashAggregateExec.doExecute won't be used at all, but doConsume and
>>>>>> doProduce (unless codegen is disabled). Is that correct?
>>>>>>
>>>>>> If so, TungstenAggregationIterator is not used at all, but
>>>>>> UnsafeFixedWidthAggregationMap is used directly instead (in the Java
code
>>>>>> that uses createHashMap or finishAggregate). Is that correct?
>>>>>>
>>>>>> Pozdrawiam,
>>>>>> Jacek Laskowski
>>>>>> ----
>>>>>> https://about.me/JacekLaskowski
>>>>>> Mastering Spark SQL https://bit.ly/mastering-spark-sql
>>>>>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>>>>>> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
>>>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>>>
>>>>>

Mime
View raw message