spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: mllib.recommendation Design
Date Wed, 25 Mar 2015 14:59:59 GMT
Hi Xiangrui,

I am facing some minor issues in implementing Alternating Nonlinear
Minimization as documented in this JIRA due to the ALS code being in ml
package: https://issues.apache.org/jira/browse/SPARK-6323

I need to use Vectors.fromBreeze / Vectors.toBreeze but they are package
private on mllib. For now I removed private but not sure that's the correct
way...

I also need to re-use lot of building blocks from ml.ALS and so I am
writing ALM in ml package...

I thought the plan was to still write core algorithms in mllib and pipeline
integration in ml...It will be great if you can move the ALS object from ml
to mllib and that way I can also move ALM to mllib (which I feel is the
right place)...Of course the Pipeline based flow will stay in ml package...

We can decide later if ALM needs to be in recommendation or a better place
is package called factorization but the idea is that ALM will support MAP
(and may be KL divergence loss) with sparsity constraints (probability
simplex and bounds are fine for what I am focused at right now)...

Thanks.
Deb

On Tue, Feb 17, 2015 at 4:40 PM, Debasish Das <debasish.das83@gmail.com>
wrote:

> There is a usability difference...I am not sure if recommendation.ALS
> would like to add both userConstraint and productConstraint ? GraphLab CF
> for example has it and we are ready to support all the features for modest
> ranks where gram matrices can be made...
>
> For large ranks I am still working on the code
>
> On Tue, Feb 17, 2015 at 3:19 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
>
>> The current ALS implementation allow pluggable solvers for
>> NormalEquation, where we put CholeskeySolver and NNLS solver. Please
>> check the current implementation and let us know how your constraint
>> solver would fit. For a general matrix factorization package, let's
>> make a JIRA and move our discussion there. -Xiangrui
>>
>> On Fri, Feb 13, 2015 at 7:46 AM, Debasish Das <debasish.das83@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I am bit confused on the mllib design in the master. I thought that core
>> > algorithms will stay in mllib and ml will define the pipelines over the
>> > core algorithm but looks like in master ALS is moved from mllib to ml...
>> >
>> > I am refactoring my PR to a factorization package and I want to build
>> it on
>> > top of ml.recommendation.ALS (possibly extend from ml.recommendation.ALS
>> > since first version will use very similar RDD handling as ALS and a
>> > proximal solver that's being added to breeze)
>> >
>> > https://issues.apache.org/jira/browse/SPARK-2426
>> > https://github.com/scalanlp/breeze/pull/321
>> >
>> > Basically I am not sure if we should merge it with recommendation.ALS
>> since
>> > this is more generic than recommendation. I am considering calling it
>> > ConstrainedALS where user can specify different constraint for user and
>> > product factors (Similar to GraphLab CF structure).
>> >
>> > I am also working on ConstrainedALM where the underlying algorithm is no
>> > longer ALS but nonlinear alternating minimization with constraints.
>> > https://github.com/scalanlp/breeze/pull/364
>> > This will let us do large rank matrix completion where there is no need
>> to
>> > construct gram matrices. I will open up the JIRA soon after getting
>> initial
>> > results
>> >
>> > I am bit confused that where should I add the factorization package. It
>> > will use the current ALS test-cases and I have to construct more
>> test-cases
>> > for sparse coding and PLSA formulations.
>> >
>> > Thanks.
>> > Deb
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message