spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maisnam Ns <>
Subject Spark MLIB for Kaggle's CTR challenge
Date Sat, 03 Jan 2015 17:07:28 GMT
Hi ,

I entered this Kaggle's CTR challenge using scikit python framework.
Although , it gave me a reasonable score , I am just wondering to explore
Spark Mlib which I haven't used it before. Tried with Vowpal Wobbit also .

Can someone who has already worked with MLIB ,help me if Spark Mlib
supports online learning or batch SGD, if so how it performs. I don't have
a cluster of spark , just the laptop.

Any suggestions?

The training data has close to 45 million rows in csv format and test data
close to 4.2 million rows in same format.


View raw message