spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: benefits of code gen
Date Fri, 10 Feb 2017 22:08:21 GMT
yes agreed. however i believe nullSafeEval is not used for codegen?

On Fri, Feb 10, 2017 at 4:56 PM, Michael Armbrust <michael@databricks.com>
wrote:

> Function1 is specialized, but nullSafeEval is Any => Any, so that's still
> going to box in the non-codegened execution path.
>
> On Fri, Feb 10, 2017 at 1:32 PM, Koert Kuipers <koert@tresata.com> wrote:
>
>> based on that i take it that math functions would be primary
>> beneficiaries since they work on primitives.
>>
>> so if i take UnaryMathExpression as an example, would i not get the same
>> benefit if i change it to this?
>>
>> abstract class UnaryMathExpression(val f: Double => Double, name: String)
>>   extends UnaryExpression with Serializable with ImplicitCastInputTypes {
>>
>>   override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType)
>>   override def dataType: DataType = DoubleType
>>   override def nullable: Boolean = true
>>   override def toString: String = s"$name($child)"
>>   override def prettyName: String = name
>>
>>   protected override def nullSafeEval(input: Any): Any = {
>>     f(input.asInstanceOf[Double])
>>   }
>>
>>   // name of function in java.lang.Math
>>   def funcName: String = name.toLowerCase
>>
>>   def function(d: Double): Double = f(d)
>>
>>   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
>>     val self = ctx.addReferenceObj(name, this, getClass.getName)
>>     defineCodeGen(ctx, ev, c => s"$self.function($c)")
>>   }
>> }
>>
>> admittedly in this case the benefit in terms of removing complex codegen
>> is not there (the codegen was only one line), but if i can remove codegen
>> here i could also remove it in lots of other places where it does get very
>> unwieldy simply by replacing it with calls to methods.
>>
>> Function1 is specialized, so i think (or hope) that my version does no
>> extra boxes/unboxing.
>>
>> On Fri, Feb 10, 2017 at 2:24 PM, Reynold Xin <rxin@databricks.com> wrote:
>>
>>> With complex types it doesn't work as well, but for primitive types the
>>> biggest benefit of whole stage codegen is that we don't even need to put
>>> the intermediate data into rows or columns anymore. They are just variables
>>> (stored in CPU registers).
>>>
>>> On Fri, Feb 10, 2017 at 8:22 PM, Koert Kuipers <koert@tresata.com>
>>> wrote:
>>>
>>>> so i have been looking for a while now at all the catalyst expressions,
>>>> and all the relative complex codegen going on.
>>>>
>>>> so first off i get the benefit of codegen to turn a bunch of chained
>>>> iterators transformations into a single codegen stage for spark. that makes
>>>> sense to me, because it avoids a bunch of overhead.
>>>>
>>>> but what i am not so sure about is what the benefit is of converting
>>>> the actual stuff that happens inside the iterator transformations into
>>>> codegen.
>>>>
>>>> say if we have an expression that has 2 children and creates a struct
>>>> for them. why would this be faster in codegen by re-creating the code to
do
>>>> this in a string (which is complex and error prone) compared to simply have
>>>> the codegen call the normal method for this in my class?
>>>>
>>>> i see so much trivial code be re-created in codegen. stuff like this:
>>>>
>>>>   private[this] def castToDateCode(
>>>>       from: DataType,
>>>>       ctx: CodegenContext): CastFunction = from match {
>>>>     case StringType =>
>>>>       val intOpt = ctx.freshName("intOpt")
>>>>       (c, evPrim, evNull) => s"""
>>>>         scala.Option<Integer> $intOpt =
>>>>           org.apache.spark.sql.catalyst.util.DateTimeUtils.stringToDat
>>>> e($c);
>>>>         if ($intOpt.isDefined()) {
>>>>           $evPrim = ((Integer) $intOpt.get()).intValue();
>>>>         } else {
>>>>           $evNull = true;
>>>>         }
>>>>        """
>>>>
>>>> is this really faster than simply calling an equivalent functions from
>>>> the codegen, and keeping the codegen logic restricted to the "unrolling"
of
>>>> chained iterators?
>>>>
>>>>
>>>
>>
>

Mime
View raw message