spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: MLLib Linear regression
Date Tue, 07 Oct 2014 22:11:39 GMT
Did you test different regularization parameters and step sizes? In
the combination that works, I don't see "A + D". Did you test that
combination? Are there any linear dependency between A's columns and
D's columns? -Xiangrui

On Tue, Oct 7, 2014 at 1:56 PM, Sameer Tilak <sstilak@live.com> wrote:
> BTW, one detail:
>
> When number of iterations is 100 all weights are zero or below and the
> indices are only from set A.
>
> When  number of iterations is 150 I see 30+ non-zero weights (when sorted by
> weight) and indices are distributed across al sets. however MSE is high
> (5.xxx) and the result does not match the domain knowledge.
>
> When  number of iterations is 400 I see 30+ non-zero weights (when sorted by
> weight) and indices are distributed across al sets. however MSE is high
> (6.xxx) and the result does not match the domain knowledge.
>
> Any help will be highly appreciated.
>
>
> ________________________________
> From: sstilak@live.com
> To: user@spark.apache.org
> Subject: MLLib Linear regression
> Date: Tue, 7 Oct 2014 13:41:03 -0700
>
>
> Hi All,
> I have following classes of features:
>
> class A: 15000 features
> class B: 170 features
> class C: 900 features
> Class D:  6000 features.
>
> I use linear regression (over sparse data). I get excellent results with low
> RMSE (~0.06) for the following combinations of classes:
> 1. A + B + C
> 2. B + C + D
> 3. A + B
> 4. A + C
> 5. B + D
> 6. C + D
> 7. D
>
> Unfortunately, when I use A + B + C + D (all the features) I get results
> that don't make any sense -- all weights are zero or below and the indices
> are only from set A. I also get high MSE. I changed the number of iterations
> from 100 to 150, 250, or even 400. I still get MSE as (5/ 6). Are there any
> other parameters that I can play with? Any insight on what could be wrong?
> Is it somehow it is not able to scale up to 22K features? (I highly doubt
> that).
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message