spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Wais <paulw...@gmail.com>
Subject Re: Tungsten off heap memory access for C++ libraries
Date Tue, 01 Sep 2015 21:31:58 GMT
Paul: I've worked on running C++ code on Spark at scale before (via JNA, ~200
cores) and am working on something more contribution-oriented now (via JNI). 
A few comments:
 * If you need something *today*, try JNA.  It can be slow (e.g. a short
native function in a tight loop) but works if you have an existing C
library.
 * If you want true zero-copy nested data structures (with explicit schema),
you probably want to look at Google Flatbuffers or Captain Proto.  Protobuf
does copies; not sure about Avro.  However, if instances of your nested
messages fit completely in CPU cache, there might not be much benefit to
zero-copy.
 * Tungsten numeric arrays and UTF-8 strings should be portable but likely
need some special handling.  (A major benefit of Protobuf, Avro,
Flatbuffers, Capnp, etc., is these libraries already handle endianness and
UTF8 for C++).  
 * NB: Don't try to dive into messing with (standard) Java String <->
std::string using JNI.  It's a very messy problem :)

Was there indeed a JIRA started to track this issue?  Can't find it at the
moment ...



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-tp13898p13929.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message