spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daedalus <tushar.nagara...@gmail.com>
Subject Persistent Local Node variables
Date Mon, 23 Jun 2014 05:34:57 GMT
*TL;DR:* I want to run a pre-processing step on the data from each partition
(such as parsing) and retain the parsed object on each node for future
processing calls to avoid repeated parsing.

/More detail:/

I have a server and two nodes in my cluster, and data partitioned using
hdfs.
I am trying to use spark to process the data and send back results.

The data is available as text, and I would like to first parse this text,
and then run future processing.
To do this, I call a simple:
JavaRDD.foreachPartition(Iterator<String>)(new
VoidFunction<Iterator&lt;String>>(){
	@Override
	public void call(Iterator<String> i){
		ParsedData p=new ParsedData(i);
	}
});

I would like to retain this ParsedData object on each node for future
processing calls, so as to avoid parsing all over again. So in my next call,
I'd like to do something like this:

JavaRDD.foreachPartition(Iterator<String>)(new
VoidFunction<Iterator&lt;String>>(){
	@Override
	public void call(Iterator<String> i){
		//refer to previously created ParsedData object
		p.process();
		//accumulate some results
	}
});



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Persistent-Local-Node-variables-tp8104.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message