spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Amde <manish...@gmail.com>
Subject Re: DecisionTree Algorithm used in Spark MLLib
Date Fri, 02 Jan 2015 07:41:08 GMT
Hi Anoop,

The Spark decision tree implementation supports: regression and multi class
classification, continuous and categorical features, pruning and does not
support missing features at present. You can probably think of it as
distributed CART though personally I always find the acronyms confusing.

How much difference are you seeing? There is a very small difference in how
the candidate split thresholds are calculated in various libraries (there
is no right way) but it should not lead to significant difference in
performance.

-Manish


On Monday, December 29, 2014, Anoop Shiralige <anoop.shiralige@gmail.com>
wrote:

> Hi All,
>
> I am trying to do a comparison, by building the model locally using R and
> on cluster using spark.
> There is some difference in the results.
>
> Any idea what is the internal implementation of Decision Tree in Spark
> MLLib.. (ID3 or C4.5 or C5.0 or CART algorithm).
>
> Thanks,
> AnoopShiralige
>

Mime
View raw message