spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matheus Braun Magrin <>
Subject Error when loading saved ml model on pyspark (2.0.1)
Date Tue, 31 Jan 2017 10:39:08 GMT
Hi there,

I've posted this question on StackOverflow as well but I got no answers,
maybe you guys can help me out.

I'm building a Random Forest model using Spark and I want to save it to use
again later. I'm running this on pyspark (Spark 2.0.1) without HDFS, so the
files are saved to the local file system.

I've tried to do it like so:

import pyspark.sql.types as T
from import VectorAssembler
from import RandomForestClassifier

data = [[0, 0, 0.],
        [0, 1, 1.],
        [1, 0, 1.],
        [1, 1, 0.]]

schema = T.StructType([
    T.StructField('a', T.IntegerType(), True),
    T.StructField('b', T.IntegerType(), True),
    T.StructField('label', T.DoubleType(), True)])

df = sqlContext.createDataFrame(data, schema)

assembler = VectorAssembler(inputCols=['a', 'b'], outputCol='features')
df = assembler.transform(df)

classifier = RandomForestClassifier(numTrees=10, maxDepth=15,
labelCol='label', featuresCol='features')
model =


And then, to load the model:

from import RandomForestClassificationModel

loaded_model = RandomForestClassificationModel.load('saved_model')

But I get this error:

Py4JJavaError: An error occurred while calling o108.load.
: java.lang.UnsupportedOperationException: empty collection

I'm not sure to which collection it is referring to. Any ideas how to
properly load (or save) the model?

Matheus Braun Magrin

View raw message