spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Fast HashSets & HashMaps - Spark Collection Utils
Date Thu, 15 Jan 2015 09:07:37 GMT
A recent discussion says these won't be public. However there are many
optimized collection libs in Java. My favorite is Koloboke:
https://github.com/OpenHFT/Koloboke/wiki/Koloboke:-roll-the-collection-implementation-with-features-you-need
Carrot HPPC is good too. The only catch is that the libraries are huge so
you may end up using your build to chop out packages you don't need.
Otherwise its 20+ MB of code.
On Jan 15, 2015 4:05 AM, "Night Wolf" <nightwolfzor@gmail.com> wrote:

> Hi all,
>
> I'd like to leverage some of the fast Spark collection implementations in
> my own code.
>
> Particularity for doing things like distinct counts in a mapPartitions
> loop.
>
> Are there any plans to make the org.apache.spark.util.collection
> implementations public? Is there any other library out there with similar
> performance?
>
> Cheers,
> NW
>

Mime
View raw message