spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Amde <>
Subject Re: DecisionTree Algorithm used in Spark MLLib
Date Fri, 02 Jan 2015 07:41:08 GMT
Hi Anoop,

The Spark decision tree implementation supports: regression and multi class
classification, continuous and categorical features, pruning and does not
support missing features at present. You can probably think of it as
distributed CART though personally I always find the acronyms confusing.

How much difference are you seeing? There is a very small difference in how
the candidate split thresholds are calculated in various libraries (there
is no right way) but it should not lead to significant difference in


On Monday, December 29, 2014, Anoop Shiralige <>

> Hi All,
> I am trying to do a comparison, by building the model locally using R and
> on cluster using spark.
> There is some difference in the results.
> Any idea what is the internal implementation of Decision Tree in Spark
> MLLib.. (ID3 or C4.5 or C5.0 or CART algorithm).
> Thanks,
> AnoopShiralige

View raw message