spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Annabel Melongo <>
Subject Re: Shared memory between C++ process and Spark
Date Mon, 07 Dec 2015 18:26:34 GMT
My guess is that Jia wants to run C++ on top of Spark. If that's the case, I'm afraid this
is not possible. Spark has support for Java, Python, Scala and R.
The best way to achieve this is to run your application in C++ and used the data created by
said application to do manipulation within Spark. 

    On Monday, December 7, 2015 1:15 PM, Jia <> wrote:

 Thanks, Dewful!
My impression is that Tachyon is a very nice in-memory file system that can connect to multiple
storages.However, because our data is also hold in memory, I suspect that connecting to Spark
directly may be more efficient in performance.But definitely I need to look at Tachyon more
carefully, in case it has a very efficient C++ binding mechanism.
Best Regards,Jia
On Dec 7, 2015, at 11:46 AM, Dewful <> wrote:

Maybe looking into something like Tachyon would help, I see some sample c++ bindings, not
sure how much of the current functionality they support...Hi, Robin, Thanks for your reply
and thanks for copying my question to user mailing list.Yes, we have a distributed C++ application,
that will store data on each node in the cluster, and we hope to leverage Spark to do more
fancy analytics on those data. But we need high performance, that’s why we want shared memory.Suggestions
will be highly appreciated!
Best Regards,Jia
On Dec 7, 2015, at 10:54 AM, Robin East <> wrote:

-dev, +user (this is not a question about development of Spark itself so you’ll get more
answers in the user mailing list)
First up let me say that I don’t really know how this could be done - I’m sure it would
be possible with enough tinkering but it’s not clear what you are trying to achieve. Spark
is a distributed processing system, it has multiple JVMs running on different machines that
each run a small part of the overall processing. Unless you have some sort of idea to have
multiple C++ processes collocated with the distributed JVMs using named memory mapped files
doesn’t make architectural sense. 
-------------------------------------------------------------------------------Robin EastSpark
GraphX in Action Michael Malak and Robin EastManning Publications Co.

On 6 Dec 2015, at 20:43, Jia <> wrote:
Dears, for one project, I need to implement something so Spark can read data from a C++ process.

To provide high performance, I really hope to implement this through shared memory between
the C++ process and Java JVM process.
It seems it may be possible to use named memory mapped files and JNI to do this, but I wonder
whether there is any existing efforts or more efficient approach to do this?
Thank you very much!

Best Regards,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message