spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: writing to local files on a worker
Date Mon, 12 Nov 2018 06:51:58 GMT
Can you use JNI to call the c++ functionality directly from Java? 

Or you wrap this into a MR step outside Spark and use Hadoop Streaming (it allows you to use
shell scripts as mapper and reducer)?

You can also write temporary files for each partition and execute the software within a map

Generally you should not call external applications from Spark.

> Am 11.11.2018 um 23:13 schrieb Steve Lewis <>:
> I have a problem where a critical step needs to be performed by  a third party c++ application.
I can send or install this program on the worker nodes. I can construct  a function holding
all the data this program needs to process. The problem is that the program is designed to
read and write from the local file system. I can call the program from Java and read its output
as  a  local file - then deleting all temporary files but I doubt that it is possible to get
the program to read from hdfs or any shared file system. 
> My question is can a function running on a worker node create temporary files and pass
the names of these to a local process assuming everything is cleaned up after the call?
> -- 
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com

To unsubscribe e-mail:

View raw message