spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ablaye FAYE <>
Subject [PySpark CrossValidator] Dropping column randCol before fitting model
Date Tue, 09 Jun 2020 08:59:38 GMT

I have noticed that the _fit method of CrossValidator class adds a new
column (randCol) to the input dataset in Pyspark. This column allows to
split the dataset in k folds.

Is this variable removed from the training data and test data of the fold
before fitting model?

I ask this question because I've gone through all the code but I haven't
seen a place where this variable is removed before executing the fitting.

Thanks for your help

View raw message