# spark-user mailing list archives

##### Site index · List index
Message view
Top
From Zhiliang Zhu <zchl.j...@yahoo.com.INVALID>
Subject Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes
Date Mon, 26 Oct 2015 07:27:17 GMT
```Hi Meihua, DB  Tsai,
Thanks very much for your all kind help.While I add some more LabeledPoint  in the training
data, then the output result also seems much better. I will also try setFitIntercept(false)
way .

Currently I encounted some problem about algorithm optimization issue: f(x1, x2, ..., xn)
= a11 *x1 * x1 + a12 * x1 * x2 + a22 * x2 * x2 + ... + ann * xn * xn ,  with constraint equations:b1
* x1 + b2 * x2 + ... bn * xn = 1, xi >= 0 etc .To find the proper x = [x1, x2, ..., xn]
to make f(x1, x2, , xn) the biggest .

It is reqiured to use Spark to fix it, however, I am not familar to use spark directly on
algorithm optimization issue, and now I am not skilled to use  gradient descentway on the
multiple dimension function.If you know this issue, would you help comment some.
Yes, then I converted this problem into someone about solve systems of linear equations c1
* w1 + c2 * w2 + ... + cn * wn = d,I just view c and w convensely as, w1 * c1 + w2 * c2 +
... + wn * cn = d, then w becomes coefficient and c becomes variable,
I think Spark Linear Regression would be helpful here.
Expert Sujit also kindly help me to point out the way to figure out pseudo inverse A for Ax
= b, I will also try it next.
Since I would use Spark to fix the issue, as you said breeze shall be used here, would you
help explain or direct some about the way to use it here...

Thank you very much !Zhiliang

On Monday, October 26, 2015 2:58 PM, Meihua Wu <rotationsymmetry14@gmail.com> wrote:

LinearRegression by default includes an intercept in the model, e.g.
label = intercept + features dot weight

To get the result you want, you need to force the intercept to be zero.

Just curious, are you trying to solve systems of linear equations? If
so, you can probably try breeze.

On Sun, Oct 25, 2015 at 9:10 PM, Zhiliang Zhu
<zchl.jump@yahoo.com.invalid> wrote:
>
>
>
> On Monday, October 26, 2015 11:26 AM, Zhiliang Zhu
> <zchl.jump@yahoo.com.INVALID> wrote:
>
>
> Hi DB Tsai,
>
> Thanks very much for your kind help. I  get it now.
>
> I am sorry that there is another issue, the weight/coefficient result is
> perfect while A is triangular matrix, however, while A is not triangular
> matrix (but
> transformed from triangular matrix, still is invertible), the result seems
> not perfect and difficult to make it better by resetting the parameter.
> Would you help comment some about that...
>
> List<LabeledPoint> localTraining = Lists.newArrayList(
>      new LabeledPoint(30.0, Vectors.dense(1.0, 2.0, 3.0, 4.0)),
>      new LabeledPoint(29.0, Vectors.dense(0.0, 2.0, 3.0, 4.0)),
>      new LabeledPoint(25.0, Vectors.dense(0.0, 0.0, 3.0, 4.0)),
>      new LabeledPoint(-3.0, Vectors.dense(0.0, 0.0, -1.0, 0.0)));
> ...
> LinearRegression lr = new LinearRegression()
>      .setMaxIter(20000)
>      .setRegParam(0)
>      .setElasticNetParam(0);
> ....
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> It seems that no matter how to reset the parameters for lr , the output of
> x3 and x4 is always nearly the same .
> Whether there is some way to make the result a little better...
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> x3 and x4 could not become better, the output is:
> Final w:
> [0.9999999477672867,1.9999999748740578,3.5000000112393734,3.500000011239377]
>
> Thank you,
> Zhiliang
>
>
>
> On Monday, October 26, 2015 10:25 AM, DB Tsai <dbtsai@dbtsai.com> wrote:
>
>
> Column 4 is always constant, so no predictive power resulting zero weight.
>
> On Sunday, October 25, 2015, Zhiliang Zhu <zchl.jump@yahoo.com> wrote:
>
> Hi DB Tsai,
>
>
> As for your comment, I just modified and tested the key part of the codes:
>
>  LinearRegression lr = new LinearRegression()
>        .setMaxIter(10000)
>        .setRegParam(0)
>        .setElasticNetParam(0);  //the number could be reset
>
>  final LinearRegressionModel model = lr.fit(training);
>
> Now the output is much reasonable, however, x4 is always 0 while repeatedly
> reset those parameters in lr , would you help some about it how to properly
> set the parameters ...
>
> Final w: [1.000000127825909,1.999999979185054,2.999999993307136,0.0]
>
> Thank you,
> Zhiliang
>
>
>
>
> On Monday, October 26, 2015 5:14 AM, DB Tsai <dbtsai@dbtsai.com> wrote:
>
>
> LinearRegressionWithSGD is not stable. Please use linear regression in
> http://spark.apache.org/docs/latest/ml-linear-methods.html
>
> Sincerely,
>
> DB Tsai
> ----------------------------------------------------------
> Web: https://www.dbtsai.com
> PGP Key ID: 0xAF08DF8D
>
>
> On Sun, Oct 25, 2015 at 10:14 AM, Zhiliang Zhu
> <zchl.jump@yahoo.com.invalid> wrote:
>> Dear All,
>>
>> I have some program as below which makes me very much confused and
>> inscrutable, it is about multiple dimension linear regression mode, the
>> weight / coefficient is always perfect while the dimension is smaller than
>> 4, otherwise it is wrong all the time.
>> Or, whether the LinearRegressionWithSGD would be selected for another one?
>>
>> public class JavaLinearRegression {
>>  public static void main(String[] args) {
>>    SparkConf conf = new SparkConf().setAppName("Linear Regression
>> Example");
>>    JavaSparkContext sc = new JavaSparkContext(conf);
>>    SQLContext jsql = new SQLContext(sc);
>>
>>    //Ax = b, x = [1, 2, 3, 4] would be the only one output about weight
>>    //x1 + 2 * x2 + 3 * x3 + 4 * x4 = y would be the multiple linear mode
>>    List<LabeledPoint> localTraining = Lists.newArrayList(
>>        new LabeledPoint(30.0, Vectors.dense(1.0, 2.0, 3.0, 4.0)),
>>        new LabeledPoint(29.0, Vectors.dense(0.0, 2.0, 3.0, 4.0)),
>>        new LabeledPoint(25.0, Vectors.dense(0.0, 0.0, 3.0, 4.0)),
>>        new LabeledPoint(16.0, Vectors.dense(0.0, 0.0, 0.0, 4.0)));
>>
>>    JavaRDD<LabeledPoint> training = sc.parallelize(localTraining).cache();
>>
>>    // Building the model
>>    int numIterations = 1000; //the number could be reset large
>>    final LinearRegressionModel model =
>> LinearRegressionWithSGD.train(JavaRDD.toRDD(training), numIterations);
>>
>>    //the coefficient weights are perfect while dimension of LabeledPoint
>> is
>> SMALLER than 4.
>>    //otherwise the output is always wrong and inscrutable.
>>    //for instance, one output is
>>    //Final w:
>>
>> [2.537341836047772E25,-7.744333206289736E24,6.697875883454909E23,-2.6704705246777624E22]
>>    System.out.print("Final w: " + model.weights() + "\n\n");
>>  }
>> }
>>
>>  I would appreciate your kind help or guidance very much~~
>>
>> Thank you!
>> Zhiliang
>>
>>
>
>
>
>
> --
> - DB
> Sent from my iPhone
>
>
>
>

```
Mime
View raw message