spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Huaxin Gao (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-30144) MLP param map missing
Date Sun, 08 Dec 2019 23:13:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-30144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991018#comment-16991018
] 

Huaxin Gao commented on SPARK-30144:
------------------------------------

Currently, MultilayerPerceptronClassificationModel only has params featuresCol, labelCol,
predictionCol, probabilityCol, rawPredictionCol. 

[~viirya]  [~podongfeng]  Are there any reasons why MultilayerPerceptronClassificationModel
doesn't extend MultilayerPerceptronParams? If not, I will make it extend MultilayerPerceptronParams.

> MLP param map missing
> ---------------------
>
>                 Key: SPARK-30144
>                 URL: https://issues.apache.org/jira/browse/SPARK-30144
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 2.4.4
>            Reporter: Glen-Erik Cortes
>            Priority: Minor
>         Attachments: MLP_params_missing.ipynb, data_banknote_authentication.csv
>
>
> Param maps for fitted classifiers are available with all classifiers except for the MultilayerPerceptronClassifier.
>   
>  There is no way to track or know what parameters were best during a crossvalidation
or which parameters were used for submodels.
>   
> {code:java}
> {
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='featuresCol', doc='features
column name'): 'features', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='labelCol', doc='label
column name'): 'fake_banknote', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='predictionCol', doc='prediction
column name'): 'prediction', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='probabilityCol', doc='Column
name for predicted class conditional probabilities. Note: Not all models output well-calibrated
probability estimates! These probabilities should be treated as confidences, not precise probabilities'):
'probability', 
> Param(parent='MultilayerPerceptronClassifier_eeab0cc242d1', name='rawPredictionCol',
doc='raw prediction (a.k.a. confidence) column name'): 'rawPrediction'}{code}
>  
>  GBTClassifier for example shows all parameters:
>   
> {code:java}
>   {
> Param(parent='GBTClassifier_a0e77b3430aa', name='cacheNodeIds', doc='If false, the algorithm
will pass trees to executors to match instances with nodes. If true, the algorithm will cache
node IDs for each instance. Caching can speed up training of deeper trees.'): False, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='checkpointInterval', doc='set checkpoint
interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed
every 10 iterations. Note: this setting will be ignored if the checkpoint directory is not
set in the SparkContext'): 10, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='featureSubsetStrategy', doc='The number
of features to consider for splits at each tree node. Supported options: auto, all, onethird,
sqrt, log2, (0.0-1.0], [1-n].'): 'all', 
> Param(parent='GBTClassifier_a0e77b3430aa', name='featuresCol', doc='features column name'):
'features', 
> Param(parent='GBTClassifier_a0e77b3430aa', name='labelCol', doc='label column name'):
'fake_banknote', Param(parent='GBTClassifier_a0e77b3430aa', name='lossType', doc='Loss function
which GBT tries to minimize (case-insensitive). Supported options: logistic'): 'logistic',

> Param(parent='GBTClassifier_a0e77b3430aa', name='maxBins', doc='Max number of bins for
discretizing continuous features. Must be >=2 and >= number of categories for any categorical
feature.'): 8, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxDepth', doc='Maximum depth of the
tree. (>= 0) E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.'):
5, Param(parent='GBTClassifier_a0e77b3430aa', name='maxIter', doc='maximum number of iterations
(>= 0)'): 20, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='maxMemoryInMB', doc='Maximum memory
in MB allocated to histogram aggregation.'): 256, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='minInfoGain', doc='Minimum information
gain for a split to be considered at a tree node.'): 0.0, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='minInstancesPerNode', doc='Minimum number
of instances each child must have after split. If a split causes the left or right child to
have fewer than minInstancesPerNode, the split will be discarded as invalid. Should be >=
1.'): 1, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='predictionCol', doc='prediction column
name'): 'prediction', 
> Param(parent='GBTClassifier_a0e77b3430aa', name='seed', doc='random seed'): 1234, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='stepSize', doc='Step size (a.k.a. learning
rate) in interval (0, 1] for shrinking the contribution of each estimator.'): 0.1, 
> Param(parent='GBTClassifier_a0e77b3430aa', name='subsamplingRate', doc='Fraction of the
training data used for learning each decision tree, in range (0, 1].'): 1.0}{code}
>  
> See attached ipynb or example notebook here:
> [https://colab.research.google.com/drive/1lwSHioZKlLh96FhGkdYFe6FUuRfTcSxH]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message