spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin East <>
Subject Re: Questions about ml.random forest (only one decision tree?)
Date Thu, 04 Aug 2016 22:05:39 GMT
All supervised learning algorithms in Spark work the same way. You provide a set of ‘features’
(X) and a corresponding label (y) as part of a pipeline and call the fit method on the pipeline.
The output of this is a model. You can then provide new examples (new Xs) to a transform method
on the model that will give you a prediction for those examples. This means that the code
for running different algorithms often looks very similar. The details of the algorithm are
hidden behind the fit/transform interface.

In the case of Random Forest the implementation in Spark (i.e. behind the interface) is to
create a number of different decision tree models (often quite simple models) and then ensemble
the results of each decision tree. You don’t need to ‘create’ the decision trees yourself,
that is handled by the implementation.

Hope that helps

Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co. <>

> On 4 Aug 2016, at 09:48, 陈哲 <> wrote:
> Hi all
>      I'm trying to use spark ml to do some prediction with random forest. By reading
the example code
, I can only find out it's similar to
Is random forest algorithm suppose to use multiple decision trees to work. 
>      I'm new about spark and ml. Is there  anyone help me, maybe provide example about
using multiple decision trees in random forest in spark
> Thanks
> Best Regards
> Patrick

View raw message