spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianguo Li <>
Subject Re: Does the kFold in Spark always give you the same split?
Date Fri, 30 Jan 2015 19:27:39 GMT
Thanks. I did specify a seed parameter.

Seems that the problem is not caused by kFold. I actually ran another
experiment without cross validation. I just built a model with the training
data and then tested the model on the test data. However, the accuracy
still varies from one run to another. Interestingly, this only happens when
I ran the experiment on our cluster. If I ran the experiment on my local
machine, I can reproduce the result each time. Has anybody encountered
similar issue before?



On Fri, Jan 30, 2015 at 11:22 AM, Sean Owen <> wrote:

> Have a look at the source code for MLUtils.kFold. Yes, there is a
> random element. That's good; you want the folds to be randomly chosen.
> Note there is a seed parameter, as in a lot of the APIs, that lets you
> fix the RNG seed and so get the same result every time, if you need
> to.
> On Fri, Jan 30, 2015 at 4:12 PM, Jianguo Li <>
> wrote:
> > Hi,
> >
> > I am using the utility function kFold provided in Spark for doing k-fold
> > cross validation using logistic regression. However, each time I run the
> > experiment, I got different different result. Since everything else stays
> > constant, I was wondering if this is due to the kFold function I used.
> Does
> > anyone know if the kFold gives you a different split on a data set each
> time
> > you call it?
> >
> > Thanks,
> >
> > Jianguo

View raw message