spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weichen Xu (JIRA)" <>
Subject [jira] [Commented] (SPARK-10078) Vector-free L-BFGS
Date Thu, 12 Jan 2017 02:43:16 GMT


Weichen Xu commented on SPARK-10078:

[~debasish83] But when we implement VF-LBFGS/VF-OWLQN base on spark, we found that many optimizations
need to combine spark features and the optimizer algorithm closely, make a abstract interface
supporting distributed vector (for example, Vector space operator include dot, add, scale,
persist/unpersist operators and so on...) seems not enough.
I give two simple problem to show the complexity when considering general interface:
1. Look this VF-OWLQN implementation based on spark:
We know that OWLQN internal will help compute the pseudo-gradient for L1 reg, look the code
function `calculateComponentWithL1`, here when computing pseudo-gradient using RDD, it also
use an accumulator(only spark have) to calculate the adjusted fnValue, so that will the abstract
interface containing something about `accumulator` in spark ?
2. About persist, unpersist, checkpoint problem in spark. Because of spark lazy computation
feature, improper persist/unpersist/checkpoint order may cause serious problem (may cause
RDD recomputation, checkpoint take no effect and so on), about this complexity, we can take
a look into the VF-BFGS implementation on spark:
it use the pattern "persist current step RDDs, then unpersist previous step RDDs" like many
other algos in spark mllib. The complexity is at, spark always do lazy computation, when you
persist RDD, it do not persist immediately, but postponed to RDD.action called. If the internal
code call `unpersist` too early, it will cause the problem that an RDD haven't been computed
and haven't been persisted, but already been unpersisted.
This feature may be much different than other distributed platform, so that a general interface
can really handle this problem correctly and still keep high efficient in the same time? 
[~sethah] Do you consider this detail problems when you designing the general optimizer interface

> Vector-free L-BFGS
> ------------------
>                 Key: SPARK-10078
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Xiangrui Meng
>            Assignee: Yanbo Liang
> This is to implement a scalable version of vector-free L-BFGS (
> Design document:

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message