spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianmin Wu <>
Subject questions about logistic regression in mllib
Date Thu, 19 Sep 2013 01:21:03 GMT
Hi all, 
I read the Logistic Regression(LR) implementation in Spark and got several
questions. Could anyone here give some explanation?
1. The implementation is for dense representation of the feature vectors.
But the feature vector is highly sparse in most of the case. So any plan on
a version for sparse feature vector? Or any reason to do so intentionally?
2. Any experiments data exists for the convergence performance? The setting
of learning rate is tricky, we see a fairly straightforward learning rate
update rule in current implementation.
3. Any research work for the practical learning rate setting? As a matter of
fact, I implemented a python version of LR with stochastic gradient descent
method for sparse feature vector in Spark, and am facing some convergence
issue. I failed to get some clues in Tong's work "Solving Large Scale Linear
Prediction Problems Using Stochastic Gradient Descent Algorithms" and some
related papers like "Pegasos: Primal Estimated sub-Gradient solver for SVM".
Any suggestions and explanations are appreciated.

Thanks in advance,

View raw message