spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Yip <yipjus...@gmail.com>
Subject MLLib : Decision Tree with minimum points per node
Date Sat, 14 Jun 2014 03:55:03 GMT
Hello,

I have been playing around with mllib's decision tree library. It is
working great, thanks.

I have a question regarding overfitting. It appears to me that the current
implementation doesn't allows user to specify the minimum number of samples
per node. This results in some nodes only contain very few samples, which
potentially leads to overfitting.

I would like to know if there is workaround or any way to prevent
overfitting? Or will decision tree supports min-samples-per-node in future
releases?

Thanks.

Justin

Mime
View raw message