spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Max Moroz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-16319) Non-linear (DAG) pipelines need better explanation
Date Fri, 01 Jul 2016 21:57:11 GMT

    [ https://issues.apache.org/jira/browse/SPARK-16319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359711#comment-15359711
] 

Max Moroz commented on SPARK-16319:
-----------------------------------

[~srowen] regarding inputCol / outputCol: not every class has it. In the ml.features, RFormula,
SQLTransformer, ChiSqSelector don't have inputCol or outputCol or both. In other submodules
like ml.classification and ml.clustering, etc., none of the classes have these parameters.
I am guessing in some cases featuresCol and predictionCol / labelCol might serve the same
purpose, but it's really not something one should be guessing about. In other cases, there's
really no obvious guess.


> Non-linear (DAG) pipelines need better explanation
> --------------------------------------------------
>
>                 Key: SPARK-16319
>                 URL: https://issues.apache.org/jira/browse/SPARK-16319
>             Project: Spark
>          Issue Type: Documentation
>          Components: ML
>    Affects Versions: 2.0.0
>            Reporter: Max Moroz
>            Priority: Minor
>
> There's a [paragraph|http://spark.apache.org/docs/2.0.0-preview/ml-guide.html#details]
about non-linear pipeline in the ML docs, but it's not clear how DAG pipeline differs from
a linear pipeline, and in fact, it seems that a "DAG Pipeline" results in the behavior identical
to that of a regular linear pipeline (the stages are simply applied in the order provided
when the pipeline is created). In addition, no checks of input and output columns seem to
occur when the pipeline.fit() or pipeline.transform() is called.
> It would be better to clarify in the docs and/or remove that paragraph.
> I'd be happy to write it up, but I have no idea what the intention of this concept is
at this point.
> [Additional reference on SO|http://stackoverflow.com/questions/37541668/non-linear-dag-ml-pipelines-in-apache-spark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message