spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Wais <>
Subject Re: Tungsten off heap memory access for C++ libraries
Date Tue, 01 Sep 2015 21:31:58 GMT
Paul: I've worked on running C++ code on Spark at scale before (via JNA, ~200
cores) and am working on something more contribution-oriented now (via JNI). 
A few comments:
 * If you need something *today*, try JNA.  It can be slow (e.g. a short
native function in a tight loop) but works if you have an existing C
 * If you want true zero-copy nested data structures (with explicit schema),
you probably want to look at Google Flatbuffers or Captain Proto.  Protobuf
does copies; not sure about Avro.  However, if instances of your nested
messages fit completely in CPU cache, there might not be much benefit to
 * Tungsten numeric arrays and UTF-8 strings should be portable but likely
need some special handling.  (A major benefit of Protobuf, Avro,
Flatbuffers, Capnp, etc., is these libraries already handle endianness and
UTF8 for C++).  
 * NB: Don't try to dive into messing with (standard) Java String <->
std::string using JNI.  It's a very messy problem :)

Was there indeed a JIRA started to track this issue?  Can't find it at the
moment ...

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message