spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cjwang ...@cjwang.us>
Subject Garbage stats in Random Forest leaf node?
Date Tue, 17 Mar 2015 00:19:43 GMT
I dumped the trees in the random forest model, and occasionally saw a leaf
node with strange stats:

- pred=1.000000 prob=0.800000 imp=-1.000000
gain=-179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000


Here impurity = -1 and gain = a giant negative number.  Normally, I would
get a None from Node.stats at a leaf node.  Here it printed because Some(s)
matches:

	    node.stats match {
	        case Some(s) => println(" imp=%f gain=%f" format(s.impurity,
s.gain))
	        case None => println
	    }


Is it a bug?

This doesn't seem happening in the model from DecisionTree, but my data sets
are limited.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Garbage-stats-in-Random-Forest-leaf-node-tp22087.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message