spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lewis <>
Subject Re: writing to local files on a worker
Date Mon, 12 Nov 2018 17:20:53 GMT
I have been looking at Spark-Blast which calls Blast - a well known C++
program in parallel -
In my case I have tried to translate the C++ code to Java but am not
getting the same results - it is convoluted -
I have code that will call the program and read its results - the only real
issue is the program wants local files -
their use is convoluted with many seeks so replacement with streaming will
not work -
as long as my Java code can write to a local file for the duration of one
call things can work -

I considered in memory files as long as they can be passed to another
program - I am willing to have OS specific code
So my issue is I need to write 3 files - run a program and read one output
file - then all files can be deleted -
JNI calls will be hard - this is s program not a library and it is
available for worker nodes

On Sun, Nov 11, 2018 at 10:52 PM Jörn Franke <> wrote:

> Can you use JNI to call the c++ functionality directly from Java?
> Or you wrap this into a MR step outside Spark and use Hadoop Streaming (it
> allows you to use shell scripts as mapper and reducer)?
> You can also write temporary files for each partition and execute the
> software within a map step.
> Generally you should not call external applications from Spark.
> > Am 11.11.2018 um 23:13 schrieb Steve Lewis <>:
> >
> > I have a problem where a critical step needs to be performed by  a third
> party c++ application. I can send or install this program on the worker
> nodes. I can construct  a function holding all the data this program needs
> to process. The problem is that the program is designed to read and write
> from the local file system. I can call the program from Java and read its
> output as  a  local file - then deleting all temporary files but I doubt
> that it is possible to get the program to read from hdfs or any shared file
> system.
> > My question is can a function running on a worker node create temporary
> files and pass the names of these to a local process assuming everything is
> cleaned up after the call?
> >
> > --
> > Steven M. Lewis PhD
> > 4221 105th Ave NE
> > Kirkland, WA 98033
> > 206-384-1340 (cell)
> > Skype lordjoe_com
> >

Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

View raw message