spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zakaria Hili <zakah...@gmail.com>
Subject Can't generate model for prediction
Date Thu, 11 Aug 2016 08:18:39 GMT
Hi,

I recognize that spark can't save generated model on HDFS (I'm used random
forest regression and linear regression for this test).
it can save only the data directory as you can see in the picture bellow :

[image: Images intégrées 1]

but to load a model I will need some data from metadata directory.

When i test this application using my windows file system, its work
perfectely (this method generate two folders: metadata and data).

the error :

16/08/11 10:03:10 INFO Methods.RandomForestRegression: Model Saved
successfuly
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist: hdfs://
10.15.0.144:8020/user/ubuntu/ModelPrediction/metadata
        at
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
        at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
        at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
        at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1277)
        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
        at org.apache.spark.rdd.RDD.take(RDD.scala:1272)
        at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1312)
        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
        at org.apache.spark.rdd.RDD.first(RDD.scala:1311)
        at
org.apache.spark.mllib.util.Loader$.loadMetadata(modelSaveLoad.scala:129)
        at
org.apache.spark.mllib.tree.model.RandomForestModel$.load(treeEnsembleModels.scala:88)
        at
org.apache.spark.mllib.tree.model.RandomForestModel.load(treeEnsembleModels.scala)
        at
Analytics.Methods.RandomForestRegression.generateModel(RandomForestRegression.java:171)
        at Analytics.Main.main(Main.java:100)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/08/11 10:03:10 INFO ui.SparkUI: Stopped Spark web UI at
http://10.0.2.8:4040
16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Interrupting
monitor thread
16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Shutting down
all executors
16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Asking each
executor to shut down
16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Stopped
16/08/11 10:03:10 INFO spark.MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
16/08/11 10:03:10 INFO storage.BlockManagerMaster: BlockManagerMaster
stopped
16/08/11 10:03:10 INFO
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
16/08/11 10:03:10 INFO util.ShutdownHookManager: Shutdown hook called


my code :

.
.

.
Double testMSE =

predictionAndLabel.map(new Function<Tuple2<Double, Double>, Double>() {
/**
*
*/
private static final long serialVersionUID = -2599901384333786032L;

@Override
public Double call(Tuple2<Double, Double> pl) {
Double diff = pl._1() - pl._2();
return diff * diff;
}
}).reduce(new Function2<Double, Double, Double>() {
/**
*
*/
private static final long serialVersionUID = -2714650221453068489L;

@Override
public Double call(Double a, Double b) {
return a + b;
}
}) / trainingData.count();




// Save and load model
model.save(sc.sc(), output);
Logger.getLogger(RandomForestRegression.class).info("Model Saved
successfuly");
line 171: RandomForestModel.load(sc.sc(),
output);
Logger.getLogger(RandomForestRegression.class).info("Test Load Model:
Success");


------
regards
zakaria

ᐧ

Mime
View raw message