spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Tilak <>
Subject MLLib libsvm format
Date Tue, 21 Oct 2014 20:10:14 GMT
Hi All,I have a question regarding the ordering of indices. The document says that the indices
indices are one-based and in ascending order. However, do the indices within a row need to
be sorted in ascending order? 
 Sparse dataIt is very common in practice to have sparse training data. MLlib supports reading
training examples stored in LIBSVM format, which is the default format used by LIBSVM and
LIBLINEAR. It is a text format in which each line represents a labeled sparse feature vector
using the following format:label index1:value1 index2:value2 ...
where the indices are one-based and in ascending order. After loading, the feature indices
are converted to zero-based.

For example, I have have indices ranging rom 1 to 1000 is this as a libsvm data file OK?
1    110:1.0   80:0.5   310:0.00     890:0.5  20:0.0   200:0.5   400:1.0  82:0.0 and so on:
OR do I need to sort them as:
1      80:0.5   110:1.0   310:0.00      20:0.0    82:0.0    200:0.5   400:1.0  890:0.5  	
View raw message