spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Non-linear (curved?) regression line
Date Fri, 20 Jan 2017 11:05:42 GMT
I don't think this is a Spark question. This isn't a problem you solve by
throwing all combinations of options at it. Your target is not a linear
function of input, or its square, and it's not a question of GLM link
function. You may need to look at the log-log plot because this looks like
a power-law distribution. I think you want to learn more about regression
and what it does first.

On Fri, Jan 20, 2017 at 2:10 AM Ganesh <mail@ganeshkrishnan.com> wrote:

>
> Has anyone worked on non-linear/curved regression lines with Apache Spark?
> This seems to be such a trivial issue but I have given up after
> experimenting for nearly two weeks.
> The plot line is as below and the raw data in the table at the end.
>  I just can't get Spark ML to give decent predictions with
> LinearRegression or any family in  GeneralizedLinearRegression.
>
> I need to predict 'sales per day' given SalesRank. As the chart shows its
> some kind of exponential function: lower the rank ,exponentially higher the
> sales.
>
> Things I have tried:
> Polynomial by taking square of features
> Changing family for GLR
> Changing regression parameters
> Sacrificing a goat to the Apache gods.
>
> How do I go about solving this? Do I have to resort to neural networks?
>
>
>
>
> Features Label
> 1 4358
> 5 4283
> 10 4193
> 15 4104
> 20 4017
> 50 3532
> 100 2851
> 150 2302
> 200 1858
> 250 1499
> 500 989
> 1000 553
> 2000 367
> 3500 221
> 5000 139
> 6000 126
> 7500 108
> 9000 92
> 10000 83
> 50000 12
> 75000 5
>
>
>

Mime
View raw message