spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Weiss <paulweiss....@gmail.com>
Subject Re: Tungsten off heap memory access for C++ libraries
Date Tue, 01 Sep 2015 23:57:26 GMT
https://issues.apache.org/jira/browse/SPARK-10399

Is the jira to track.
On Sep 1, 2015 5:32 PM, "Paul Wais" <paulwais@gmail.com> wrote:

> Paul: I've worked on running C++ code on Spark at scale before (via JNA,
> ~200
> cores) and am working on something more contribution-oriented now (via
> JNI).
> A few comments:
>  * If you need something *today*, try JNA.  It can be slow (e.g. a short
> native function in a tight loop) but works if you have an existing C
> library.
>  * If you want true zero-copy nested data structures (with explicit
> schema),
> you probably want to look at Google Flatbuffers or Captain Proto.  Protobuf
> does copies; not sure about Avro.  However, if instances of your nested
> messages fit completely in CPU cache, there might not be much benefit to
> zero-copy.
>  * Tungsten numeric arrays and UTF-8 strings should be portable but likely
> need some special handling.  (A major benefit of Protobuf, Avro,
> Flatbuffers, Capnp, etc., is these libraries already handle endianness and
> UTF8 for C++).
>  * NB: Don't try to dive into messing with (standard) Java String <->
> std::string using JNI.  It's a very messy problem :)
>
> Was there indeed a JIRA started to track this issue?  Can't find it at the
> moment ...
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Tungsten-off-heap-memory-access-for-C-libraries-tp13898p13929.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
View raw message