spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe <...@net2020.org>
Subject Re: writing to local files on a worker
Date Mon, 12 Nov 2018 04:37:27 GMT
Hello,
You could try using mapPartitions function if you can send partial data 
to your C++ program:

mapPartitions(func):
Similar to map, but runs separately on each partition (block) of the 
RDD, so /func/ must be of type Iterator<T> => Iterator<U> when running 
on an RDD of type T.

That way you can write partition data to temp file, call your C++ app, 
then delete the temp file. Of course your data would be limited to all 
rows in one partition.

Also the latest release of Spark (2.4.0) introduced barrier execution mode:
https://issues.apache.org/jira/browse/SPARK-24374

Maybe you could combine the two, just using mapPartitions will give you 
single partition data only, and your app call will be repeated on all 
nodes, not necessarily at the same time.

Spark's strong point is parallel execution, so what you're trying to do 
kind of defeats that.
But if you do not need to combine all the data before calling your app 
then you could do it.
Or you could split your job into Spark -> app -> Spark chain.
Good luck,

Joe



On 11/11/2018 02:13 PM, Steve Lewis wrote:
> I have a problem where a critical step needs to be performed by  a 
> third party c++ application. I can send or install this program on the 
> worker nodes. I can construct  a function holding all the data this 
> program needs to process. The problem is that the program is designed 
> to read and write from the local file system. I can call the program 
> from Java and read its output as  a  local file - then deleting all 
> temporary files but I doubt that it is possible to get the program to 
> read from hdfs or any shared file system.
> My question is can a function running on a worker node create 
> temporary files and pass the names of these to a local process 
> assuming everything is cleaned up after the call?
>
> -- 
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com
>


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message