spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Tilak <ssti...@live.com>
Subject RE: Interpreting MLLib's linear regression o/p
Date Tue, 23 Dec 2014 00:50:02 GMT
Hi,It is a text format in which each line represents a labeled sparse feature vector using
the following format:label index1:value1 index2:value2 ...This was the confusing part in the
documentation:
"where the indices are one-based and in ascending order. After loading, the feature indices
are converted to zero-based."
Let us say that I have 40 features so I create an index file like this:
Feature, index number:F1   1F2   2F3   3...F4   40
I then create my feature vectors and in the libsvm format something like:1 10:1 20:0 8:1 4:0
24:11 1:1 40:0 2:1 8:0 9:1 23:10 23:1 18:0 13:1.....

I run regression and get back models.weights which are 40 weights.Say I get 0.110.34450.00005...
In that case does the first weight (0.11) correspond to index 1/ F1 or does or correspond
to index 2/F2? Since Input is 1-based and o/p is 0-based. Or is 0-based indexing is only for
internal representation and what you get back at the end of regression is essentially 1-based
indexed like your input so 0.11 maps onto  from F1and so on?


> Date: Mon, 22 Dec 2014 16:31:57 -0800
> Subject: Re: Interpreting MLLib's linear regression o/p
> From: mengxr@gmail.com
> To: sstilak@live.com
> CC: user@spark.apache.org
> 
> Did you check the indices in the LIBSVM data and the master file? Do
> they match? -Xiangrui
> 
> On Sat, Dec 20, 2014 at 8:13 AM, Sameer Tilak <sstilak@live.com> wrote:
> > Hi All,
> > I use LIBSVM format to specify my input feature vector, which used 1-based
> > index. When I run regression the o/p is 0-indexed based. I have a master
> > lookup file that maps back these indices to what they stand or. However, I
> > need to add offset of 2 and not 1 to the regression outcome during the
> > mapping. So for example to map the index of 800 from the regression output
> > file, I look for 802 in my master lookup file and then things make sense. I
> > can understand adding offset of 1, but not sure why adding offset 2 is
> > working fine. Have others seem something like this as well?
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 
 		 	   		  
Mime
View raw message