spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seth Hendrickson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-17134) Use level 2 BLAS operations in LogisticAggregator
Date Fri, 23 Sep 2016 06:09:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15515529#comment-15515529
] 

Seth Hendrickson edited comment on SPARK-17134 at 9/23/16 6:09 AM:
-------------------------------------------------------------------

This makes sense. In my initial testing I found that having to standardize the features in
every iteration takes a non-trivial amount of time. Still, you mentioned the desire to not
cache the standardized dataset since it can create unnecessary memory overhead. One solution
is to allow the users to specify that their data has already been standardized, and then we
don't have to perform the extra divisions in the update method. Alternatively, we could do
as you suggest above, but store the coefficients in column major order in order to still maximize
cache hits.

We'll need some testing for both cases to truly understand this.


was (Author: sethah):
This makes sense. In my initial testing I found that having to standardize the features in
every iteration takes a non-trivial amount of time. Still, you mentioned the desire to not
cache the standardized dataset since it can create unnecessary memory overhead. One solution
is to allow the users to specify that there data has already been standardized, and then we
don't have to perform the extra divisions in the update method. Alternatively, we could do
as you suggest above, but store the coefficients in column major order in order to still maximize
cache hits.

We'll need some testing for both cases to truly understand this.

> Use level 2 BLAS operations in LogisticAggregator
> -------------------------------------------------
>
>                 Key: SPARK-17134
>                 URL: https://issues.apache.org/jira/browse/SPARK-17134
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Seth Hendrickson
>
> Multinomial logistic regression uses LogisticAggregator class for gradient updates. We
should look into refactoring MLOR to use level 2 BLAS operations for the updates. Performance
testing should be done to show improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message