spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yanbo Liang <yblia...@gmail.com>
Subject Re: SparkML Using Pipeline API locally on driver
Date Mon, 29 Feb 2016 06:24:36 GMT
Hi Jean,

DataFrame is connected with SQLContext which is connected with
SparkContext, so I think it's impossible to run `model.transform` without
touching Spark.
I think what you need is model should support prediction on single
instance, then you can make prediction without Spark. You can track the
progress of https://issues.apache.org/jira/browse/SPARK-10413.

Thanks
Yanbo

2016-02-27 8:52 GMT+08:00 Eugene Morozov <evgeny.a.morozov@gmail.com>:

> Hi everyone.
>
> I have a requirement to run prediction for random forest model locally on
> a web-service without touching spark at all in some specific cases. I've
> achieved that with previous mllib API (java 8 syntax):
>
>     public List<Tuple2<Double, Double>> predictLocally(RandomForestModel
> model, List<LabeledPoint> data) {
>         return data.stream()
>                 .map(point -> new
> Tuple2<>(model.predict(point.features()), point.label()))
>                 .collect(Collectors.toList());
>     }
>
> So I have instance of trained model and can use it any way I want.
> The question is whether it's possible to run this on the driver itself
> with the following:
> DataFrame predictions = model.transform(test);
> because AFAIU test has to be a DataFrame, which means it's going to be run
> on the cluster.
>
> The use case to run it on driver is very small amount of data for
> prediction - much faster to handle it this way, than using spark cluster.
> Thank you.
> --
> Be well!
> Jean Morozov
>

Mime
View raw message