spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qingsheng Ren <>
Subject [Spark MLlib]: Multiple input dataframes and non-linear ML pipeline
Date Thu, 09 Apr 2020 08:47:52 GMT
Hi all,

I'm using ML Pipeline to construct a flow of transformation. I'm wondering
if it is possible to set multiple dataframes as the input of a transformer?
For example I need to join two dataframes together in a transformer, then
feed into the estimator for training. If not, is there any plan to support
this in the future?

Another question is about non-linear pipeline. Since we can randomly assign
input and output column of a pipeline stage, what will happen if I build a
problematic DAG (like a circular one)? Is there any mechanism to prevent
this from happening?


Qingsheng (Patrick) Ren

View raw message