spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Villu Ruusmann <>
Subject Re: pmml with augustus
Date Wed, 11 Jun 2014 17:36:49 GMT
Hello Spark/PMML enthusiasts,

It's pretty trivial to integrate the JPMML-Evaluator library with Spark. In
brief, take the following steps in your Spark application code:
1) Create a Java Map ("arguments") that represents the input data record.
You need to specify a key-value mapping for every active MiningField. The
key type is org.jpmml.evaluator.FieldName. The value type could be String or
any Java primitive data type that can be converted to the requested PMML
2) Obtain an instance of org.jpmml.evaluator.Evaluator. Invoke its
#evaluate(Map<FieldName, ?>) method using the argument map created in step
3) Process the Java Map ("results") that represents the output data record.

Putting it all together:
JavaRDD<Map&lt;FieldName, String>> arguments = ...
final ModelEvaluator<?> modelEvaluator =
ModelEvaluatorFactory.getInstance()); // See the JPMML-Evaluator
JavaRDD<Map&lt;FieldName, ?>> results = arguments.flatMap(new
FlatMapFunction<Map&lt;FieldName, String>, Map<FieldName, ?>>(){

	public Iterable<Map&lt;FieldName, ?>> call(Map<FieldName, String>
		Map<FieldName, ?> result = modelEvaluator.evaluate(arguments);
		return Collections.<Map&lt;FieldName, ?>>singletonList(result);

Of course, it's not very elegant to be using JavaRDD<Map&lt;K, V>> here.
Maybe someone can give me a hint about making it look and feel more Spark-y?

Also, I would like to refute earlier comment by @pacoid, that
JPMML-evaluator compares poorly against Augustus and Zementis products.
First, JPMML-Evaluator fully supports PMML specification versions 3.0
through 4.2. I would specifically stress the support for PMML 4.2, which was
released just a few months ago. Second, JPMML is open source. Perhaps its
licensing terms could be more liberal, but it's nevertheless the most open
and approachable way of bringing Java and PMML together.


View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message