spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Wilkinson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-23978) Kryo much slower when mllib jar not on classpath
Date Fri, 13 Apr 2018 12:49:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-23978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Richard Wilkinson updated SPARK-23978:
--------------------------------------
       Priority: Minor  (was: Major)
    Description: 
Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib classes to the
kryo registration, but it does this via class.forName.

If the mllib jar is not on the classpath, this can be very slow.

My app, which is using GraphX connected components function is 2x slower in 2.3 than 2.2.1

I have attached jVisualVM stats for both cases; you can see a vast amount of time is spent
in Utils.classForName.  While debugging, i traced this to the Kryo initialization

  was:
Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib classes to the
kryo registration, but it does this via class.forName.

If the mllib jar is not on the classpath, this can be very slow.

My app, which is using GraphX connected components function is 2x slower in 2.3 than 2.2.1

 


> Kryo much slower when mllib jar not on classpath
> ------------------------------------------------
>
>                 Key: SPARK-23978
>                 URL: https://issues.apache.org/jira/browse/SPARK-23978
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.0
>         Environment: Windows 10, Java 8
>            Reporter: Richard Wilkinson
>            Priority: Minor
>         Attachments: kryo_stats.png
>
>
> Spark 2.3 added a bunch of org.apache.spark.ml and org.apache.spark.mllib classes to
the kryo registration, but it does this via class.forName.
> If the mllib jar is not on the classpath, this can be very slow.
> My app, which is using GraphX connected components function is 2x slower in 2.3 than
2.2.1
> I have attached jVisualVM stats for both cases; you can see a vast amount of time is
spent in Utils.classForName.  While debugging, i traced this to the Kryo initialization



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message