spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qingsheng Ren <renqs...@gmail.com>
Subject [Spark MLlib]: Multiple input dataframes and non-linear ML pipeline
Date Thu, 09 Apr 2020 08:36:35 GMT
Hi all,

I'm using ML Pipeline to construct a flow of transformation. I'm wondering
if it is possible to set multiple dataframes as the input of a transformer?
For example I need to join two dataframes together in a transformer, then
feed into the estimator for training. If not, is there any plan to support
this in the future?

Another question is about non-linear pipeline. Since we can randomly assign
input and output column of a pipeline stage, what will happen if I build a
problematic DAG (like a circular one)? Is there any mechanism to prevent
this from happening?

Thanks~

Qingsheng (Patrick) Ren

Mime
View raw message