spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: GenerateExec, CodegenSupport and supportCodegen flag off?!
Date Tue, 12 Dec 2017 08:55:00 GMT
Hi,

It appears that there's already a discussion about why GenerateExec
operator has the flag off.

1. https://issues.apache.org/jira/browse/SPARK-21657 Spark has exponential
time complexity to explode(array of structs) which is in progress
2. And more importantly @rxin has turned that off because --> "Disable
generate codegen since it fails my workload." - Wished he included the
workload to showcase the issue :(

Looks like there are a bunch of wise people already on it so I'll just
listen...

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

On Mon, Dec 11, 2017 at 10:15 PM, Jacek Laskowski <jacek@japila.pl> wrote:

> Hi,
>
> After another day trying to get my head around WholeStageCodegenExec
> and InputAdapter and CollapseCodegenStages optimization rule I came to
> conclusion that it may have something to do with UnsafeRow vs
> GenericInternalRow/InternalRow so when a physical operator wants to
> _somehow_ participate in whole-stage codegen it can extend CodegenSupport
> trait and enable accessing GenericInternalRow by turning supportCodegen
> flag off.
>
> I can understand how badly that can read, but without help from Spark SQL
> devs that's all I can figure out myself. Any help appreciated.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> On Sun, Dec 10, 2017 at 10:34 PM, Stephen Boesch <javadba@gmail.com>
> wrote:
>
>> A relevant observation:  there was a closed/executed jira last year to
>> remove the option to disable the codegen flag (and unsafe flag as well):
>> https://issues.apache.org/jira/browse/SPARK-11644
>>
>> 2017-12-10 13:16 GMT-08:00 Jacek Laskowski <jacek@japila.pl>:
>>
>>> Hi,
>>>
>>> I'm wondering why a physical operator like GenerateExec would
>>> extend CodegenSupport [1], but had the supportCodegen flag turned off?
>>>
>>> What's the meaning of such a combination -- be a CodegenSupport with
>>> supportCodegen off?
>>>
>>> [1] https://github.com/apache/spark/blob/master/sql/core/src
>>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L58-L64
>>>
>>> [2] https://github.com/apache/spark/blob/master/sql/core/src
>>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L125
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://about.me/JacekLaskowski
>>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>>> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>
>>
>

Mime
View raw message