spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugene Morozov <evgeny.a.moro...@gmail.com>
Subject SparkML Using Pipeline API locally on driver
Date Sat, 27 Feb 2016 00:52:07 GMT
Hi everyone.

I have a requirement to run prediction for random forest model locally on a
web-service without touching spark at all in some specific cases. I've
achieved that with previous mllib API (java 8 syntax):

    public List<Tuple2<Double, Double>> predictLocally(RandomForestModel
model, List<LabeledPoint> data) {
        return data.stream()
                .map(point -> new Tuple2<>(model.predict(point.features()),
point.label()))
                .collect(Collectors.toList());
    }

So I have instance of trained model and can use it any way I want.
The question is whether it's possible to run this on the driver itself with
the following:
DataFrame predictions = model.transform(test);
because AFAIU test has to be a DataFrame, which means it's going to be run
on the cluster.

The use case to run it on driver is very small amount of data for
prediction - much faster to handle it this way, than using spark cluster.
Thank you.
--
Be well!
Jean Morozov

Mime
View raw message