spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earthson <>
Subject Re: what is the best way to implement mini batches?
Date Mon, 15 Dec 2014 02:34:56 GMT
I think it could be done like:

1. using mapPartition to randomly drop some partition
2. drop some elements randomly(for selected partition)
3. calculate gradient step for selected elements

I don't think fixed step is needed, but fixed step could be done:

1. zipWithIndex
2. create ShuffleRDD based on the index(eg. using index/10 as key)
3. using mapPartition to calculate each bach

I also have a question: 

Can mini batches run in parallel?
I think parallel all batches just like a full batch GD in some case. 

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message