spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-16331) [SQL] Reduce code generation time
Date Fri, 01 Jul 2016 04:48:10 GMT

     [ https://issues.apache.org/jira/browse/SPARK-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reynold Xin resolved SPARK-16331.
---------------------------------
       Resolution: Fixed
         Assignee: Hiroshi Inoue
    Fix Version/s: 2.1.0

> [SQL] Reduce code generation time 
> ----------------------------------
>
>                 Key: SPARK-16331
>                 URL: https://issues.apache.org/jira/browse/SPARK-16331
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Hiroshi Inoue
>            Assignee: Hiroshi Inoue
>             Fix For: 2.1.0
>
>
> During the code generation, a {{LocalRelation}} often has a huge {{Vector}} object as
{{data}}. In the simple example below, a {{LocalRelation}} has a Vector with 1000000 elements
of {{UnsafeRow}}. 
> {quote}
> val numRows = 1000000
> val ds = (1 to numRows).toDS().persist()
> benchmark.addCase("filter+reduce") { iter =>
>   ds.filter(a => (a & 1) == 0).reduce(_ + _)
> }
> {quote}
> At {{TreeNode.transformChildren}}, all elements of the vector is unnecessarily iterated
to check whether any children exist in the vector since {{Vector}} is Traversable. This part
significantly increases code generation time.
> This patch avoids this overhead by checking the number of children before iterating all
elements; {{LocalRelation}} does not have children since it extends {{LeafNode}}.
> The performance of the above example 
> {quote}
> without this patch
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14 on Mac OS X 10.11.5
> Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
> compilationTime:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
  Relative
> ------------------------------------------------------------------------------------------------
> filter+reduce                                 4426 / 4533          0.2        4426.0
      1.0X
> with this patch
> compilationTime:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
  Relative
> ------------------------------------------------------------------------------------------------
> filter+reduce                                 3117 / 3391          0.3        3116.6
      1.0X
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message