spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DB Tsai <dbt...@alpinenow.com>
Subject Re: MLLib - Thoughts about refactoring Updater for LBFGS?
Date Tue, 04 Mar 2014 23:10:16 GMT
Hi Xiangrui,

It seems that Robert is busy recently. I setup the org.riso in
maven central for him, and I was waiting for his response for
a while without any news. So, I decided to maintain myself.

I'm more favor of using breeze for both sparse and optimization
core math library. When I tried L-BFGS in breeze with iris dataset
using multinomial logistic regression, it converged to NaN for unknown
reasons, while the Fortran one converged to the solution similar to Newton
method. For some datasets, the L-BFGS in breeze can converge to
correct solution.

We may talk to David and understand the implementation difference
between them.

I agree that we need to rework on the updater to make it to generic;
however, it may take a very long time to design it right. How about
we open another story for this? Once we finish this PR (either use
fortran or breeze implementation), and get it merge, let's have a design
discussion around this. It may be more effective since we can design a
architecture that have to work for both cases in the codebase, and will
be easier to think about the edge case for it.

Thanks.

Sincerely,

DB Tsai
Machine Learning Engineer
Alpine Data Labs
--------------------------------------
Web: http://alpinenow.com/


On Tue, Mar 4, 2014 at 9:53 AM, Xiangrui Meng <mengxr@gmail.com> wrote:
> Hi DB,
>
> I saw you released the L-BFGS code under com.dbtsai.lbfgs on maven
> central, so I assume that Robert (the author of RISO) is not going to
> maintain it. Is it correct?
>
> For the breeze implementation, do you mind sharing more details about
> the issues you have?
>
> I saw the hack you did to get regGrad in the PR. Though well
> documented, it still increases the code complexity. I agree with Evan
> that we should make the current implementation more general for first
> order updates. Maybe we should spend some time on this direction.
>
> Best,
> Xiangrui
>
> On Tue, Mar 4, 2014 at 7:25 AM, Debasish Das <debasish.das83@gmail.com> wrote:
>> Yeah we should move f2j L-BFGS and L-BFGS-B to breeze..they already have 2
>> line searches..also the OWL-QN outline...
>>
>> Hi Xiangrui,
>>
>> What's the plan on the PR ?
>> https://github.com/apache/incubator-spark/pull/575
>>
>> Will you add breeze as a dependency for the sparse support ?
>>
>> I looked at your branch
>> https://github.com/mengxr/incubator-spark/tree/sparse and the code is using
>> mahout wrapper.
>>
>> I can add a branch which updates GLM with breeze sparse matrices in case
>> you are fine with breeze license and other issues that we discussed on the
>> PR.
>>
>> Thanks.
>> Deb
>>
>>
>>
>> On Mon, Mar 3, 2014 at 10:47 PM, DB Tsai <dbtsai@alpinenow.com> wrote:
>>
>>> Hi Deb,
>>>
>>> I had tried breeze L-BFGS algorithm, and when I tried it couple weeks
>>> ago, it's not as stable as the fortran implementation. I guessed the
>>> problem is in the line search related thing. Since we may bring breeze
>>> dependency for the sparse format support as you pointed out, we can
>>> just try to fix the L-BFGS in breeze, and we can get OWL-QN and
>>> L-BFGS-B.
>>>
>>> What do you think?
>>>
>>> Thanks.
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> Machine Learning Engineer
>>> Alpine Data Labs
>>> --------------------------------------
>>> Web: http://alpinenow.com/
>>>
>>>
>>> On Mon, Mar 3, 2014 at 3:52 PM, DB Tsai <dbtsai@alpinenow.com> wrote:
>>> > Hi Deb,
>>> >
>>> >> a.  OWL-QN for solving L1 natively in BFGS
>>> > Based on what I saw from
>>> >
>>> https://github.com/tjhunter/scalanlp-core/blob/master/learn/src/main/scala/breeze/optimize/OWLQN.scala
>>> > , it seems that it's not difficult to implement OWL-QN once LBFGS is
>>> > done.
>>> >
>>> >>
>>> >> b.  Bound constraints in BFGS : I saw you have converted the fortran
>>> code.
>>> >> Is there a license issue ? I can help in getting that up to speed as
>>> well.
>>> > I tried to convert the code from Fortran L-BFGS-B implementation to
>>> > java using f2j; the translated code is just a messy, and it just
>>> > doesn't work at all. There is no license issue here. Any idea about
>>> > how to approach this?
>>> >
>>> >> c. Few variants of line searches : I will discuss on it.
>>> >> For the dbtsai-lbfgs branch seems like it already got merged by Jenkins.
>>> > I don't think it's merged into master. Still have couple things needed
>>> > to be cleaned up. Just open the PR to have public feedback.
>>> >
>>> >> Is this getting merged to the master or there will be revisions on it
?
>>> >>
>>> >> https://github.com/apache/spark/pull/53
>>> >>
>>> >> Thanks.
>>> >> Deb
>>> >
>>> > Sincerely,
>>> >
>>> > DB Tsai
>>> > Machine Learning Engineer
>>> > Alpine Data Labs
>>> > --------------------------------------
>>> > Web: http://alpinenow.com/
>>>

Mime
View raw message