spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yanbo Liang (JIRA)" <>
Subject [jira] [Commented] (SPARK-7132) Add fit with validation set to GBT
Date Tue, 01 Sep 2015 07:01:46 GMT


Yanbo Liang commented on SPARK-7132:

I will work on this issue.
I propose another way to resolve this issue.
The GBT Estimator remains take 1 input {DataFrame}, and we will split it into training and
validation dataset internal.
Because the runWithValidation interface will take RDD[LabeledPoint] as input, it's easy to
handle this.
And at the end of the GBT Estimator, we can also union these two dataset.

> Add fit with validation set to GBT
> -------------------------------------------
>                 Key: SPARK-7132
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Priority: Minor
> In spark.mllib GradientBoostedTrees, we have a method runWithValidation which takes a
validation set.  We should add that to the API.
> This will require a bit of thinking about how the Pipelines API should handle a validation
set (since Transformers and Estimators only take 1 input DataFrame).  The current plan is
to include an extra column in the input DataFrame which indicates whether the row is for training,
validation, etc.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message