spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edoardo Vacchi <>
Subject [Catalyst] RFC: Using PartialFunction literals instead of objects
Date Tue, 19 May 2015 11:16:36 GMT
Hi everybody,

At the moment, Catalyst rules are defined using two different types of rules:
`Rule[LogicalPlan]` and `Strategy` (which in turn maps to

I propose to introduce utility methods to

  a) reduce the boilerplate to define rewrite rules
  b) turning them back into what they essentially represent: function types.

These changes would be backwards compatible, and would greatly help in
understanding what the code does. Personally, I feel like the current
use of objects is redundant and possibly confusing.

## `Rule[LogicalPlan]`

The analyzer and optimizer use `Rule[LogicalPlan]`, which, besides
defining a default `val ruleName`
only defines the method `apply(plan: TreeType): TreeType`.
Because the body of such method is always supposed to read `plan match
pf`, with `pf`
being some `PartialFunction[LogicalPlan, LogicalPlan]`, we can
conclude that `Rule[LogicalPlan]`
might be substituted by a PartialFunction.

I propose the following:

a) Introduce the utility method

    def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] =
      new Rule[LogicalPlan] {
        def apply (plan: LogicalPlan): LogicalPlan = plan transform pf

b) progressively replace the boilerplate-y object definitions; e.g.

    object MyRewriteRule extends Rule[LogicalPlan] {
      def apply(plan: LogicalPlan): LogicalPlan = plan transform {
        case ... => ...


    // define a Rule[LogicalPlan]
    val MyRewriteRule = rule {
      case ... => ...

it might also be possible to make rule method `implicit`, thereby
further reducing MyRewriteRule to:

    // define a PartialFunction[LogicalPlan, LogicalPlan]
    // the implicit would convert it into a Rule[LogicalPlan] at the use sites
    val MyRewriteRule = {
      case ... => ...

## Strategies

A similar solution could be applied to shorten the code for
Strategies, which are total functions
only because they are all supposed to manage the default case,
possibly returning `Nil`. In this case
we might introduce the following utility methods:

 * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
 * The partial function must therefore return *one single* SparkPlan
for each case.
 * The method will automatically wrap them in a [[Seq]].
 * Unhandled cases will automatically return Seq.empty
protected def rule(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
  new Strategy {
    def apply(plan: LogicalPlan): Seq[SparkPlan] =
      if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty

 * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] ].
 * The partial function must therefore return a Seq[SparkPlan] for each case.
 * Unhandled cases will automatically return Seq.empty
protected def seqrule(pf: PartialFunction[LogicalPlan,
Seq[SparkPlan]]): Strategy =
  new Strategy {
    def apply(plan: LogicalPlan): Seq[SparkPlan] =
      if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]

Thanks in advance

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message