spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edoardo Vacchi <uncommonnonse...@gmail.com>
Subject Re: [Catalyst] RFC: Using PartialFunction literals instead of objects
Date Wed, 20 May 2015 10:08:08 GMT
Thanks for the prompt feedback; I have further expanded on your
suggestions on this JIRA
https://issues.apache.org/jira/browse/SPARK-7754

On Tue, May 19, 2015 at 8:35 PM, Michael Armbrust
<michael@databricks.com> wrote:
> Overall this seems like a reasonable proposal to me.  Here are a few
> thoughts:
>
>  - There is some debugging utility to the ruleName, so we would probably
> want to at least make that an argument to the rule function.
>  - We also have had rules that operate on SparkPlan, though since there is
> only one ATM maybe we don't need sugar there.
>  - I would not call the sugar for creating Strategies rule/seqrule, as I
> think the one-to-one vs one-to-many distinction is useful.
>  - I'm generally pro-refactoring to make the code nicer, especially when its
> not official public API, but I do think its important to maintain source
> compatibility (which I think you are) when possible as there are other
> projects using catalyst.
>  - Finally, we'll have to balance this with other code changes / conflicts.
>
> You should probably open a JIRA and we can continue the discussion there.
>
> On Tue, May 19, 2015 at 4:16 AM, Edoardo Vacchi <uncommonnonsense@gmail.com>
> wrote:
>>
>> Hi everybody,
>>
>> At the moment, Catalyst rules are defined using two different types of
>> rules:
>> `Rule[LogicalPlan]` and `Strategy` (which in turn maps to
>> `GenericStrategy[SparkPlan]`).
>>
>> I propose to introduce utility methods to
>>
>>   a) reduce the boilerplate to define rewrite rules
>>   b) turning them back into what they essentially represent: function
>> types.
>>
>> These changes would be backwards compatible, and would greatly help in
>> understanding what the code does. Personally, I feel like the current
>> use of objects is redundant and possibly confusing.
>>
>> ## `Rule[LogicalPlan]`
>>
>> The analyzer and optimizer use `Rule[LogicalPlan]`, which, besides
>> defining a default `val ruleName`
>> only defines the method `apply(plan: TreeType): TreeType`.
>> Because the body of such method is always supposed to read `plan match
>> pf`, with `pf`
>> being some `PartialFunction[LogicalPlan, LogicalPlan]`, we can
>> conclude that `Rule[LogicalPlan]`
>> might be substituted by a PartialFunction.
>>
>> I propose the following:
>>
>> a) Introduce the utility method
>>
>>     def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]):
>> Rule[LogicalPlan] =
>>       new Rule[LogicalPlan] {
>>         def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
>>       }
>>
>> b) progressively replace the boilerplate-y object definitions; e.g.
>>
>>     object MyRewriteRule extends Rule[LogicalPlan] {
>>       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
>>         case ... => ...
>>     }
>>
>> with
>>
>>     // define a Rule[LogicalPlan]
>>     val MyRewriteRule = rule {
>>       case ... => ...
>>     }
>>
>> it might also be possible to make rule method `implicit`, thereby
>> further reducing MyRewriteRule to:
>>
>>     // define a PartialFunction[LogicalPlan, LogicalPlan]
>>     // the implicit would convert it into a Rule[LogicalPlan] at the use
>> sites
>>     val MyRewriteRule = {
>>       case ... => ...
>>     }
>>
>>
>> ## Strategies
>>
>> A similar solution could be applied to shorten the code for
>> Strategies, which are total functions
>> only because they are all supposed to manage the default case,
>> possibly returning `Nil`. In this case
>> we might introduce the following utility methods:
>>
>> /**
>>  * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
>>  * The partial function must therefore return *one single* SparkPlan
>> for each case.
>>  * The method will automatically wrap them in a [[Seq]].
>>  * Unhandled cases will automatically return Seq.empty
>>  */
>> protected def rule(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy
>> =
>>   new Strategy {
>>     def apply(plan: LogicalPlan): Seq[SparkPlan] =
>>       if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
>>   }
>>
>> /**
>>  * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan]
>> ].
>>  * The partial function must therefore return a Seq[SparkPlan] for each
>> case.
>>  * Unhandled cases will automatically return Seq.empty
>>  */
>> protected def seqrule(pf: PartialFunction[LogicalPlan,
>> Seq[SparkPlan]]): Strategy =
>>   new Strategy {
>>     def apply(plan: LogicalPlan): Seq[SparkPlan] =
>>       if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
>>   }
>>
>> Thanks in advance
>> e.v.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message