spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: How to Improve Random Forest classifier accuracy
Date Thu, 18 Aug 2016 08:46:07 GMT
Depends on your data...
How did you split training and test set?
How does the model fit to the data?

You could try of course also to have more data to fed into the model....
Have you considered alternative machine learning models?

I do not think this is a Spark problem, but you should ask the machine learning specializing
in your data and random forrest.


> On 18 Aug 2016, at 10:31, 陈哲 <czhenjupt@gmail.com> wrote:
> 
> Hi All
>    I using spark ml Random Forest classifier, I have only two label categories (1, 0)
,about 30 features and data size over 100, 000. I run the spark JavaRandomForestClassifierExample
code, the model came out with the results (I make some change, show more detail result):
> Test Error = 0.022321731460750338
> Prediction results label = 1 count:951
> Prediction results label = 0 count:13788
> Prediction results predictedLabel = 1 and label = 1 count:682
> Prediction results predictedLabel = 1 and label = 0 count:60
> Prediction results predictedLabel = 0 and label = 1 count:269
> Prediction Right = 0.7171398527865405
> Prediction Miss= 0.28286014721345953
> Prediction Wrong= 0.004351610095735422
> 
> I need to some advice about how to improve the accuracy , I tried to change classifier
attributes , some like maxdepth, maxbins but doesn't change much.
> do I have to give more features ? or there is other ways to improve this ?
> 
> Thanks
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message