spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran Rashid <iras...@cloudera.com>
Subject Re: can spark take advantage of ordered data?
Date Thu, 12 Mar 2015 01:05:03 GMT
Hi Jonathan,

you might be interested in https://issues.apache.org/jira/browse/SPARK-3655
(not yet available) and https://github.com/tresata/spark-sorted (not part
of spark, but it is available right now).  Hopefully thats what you are
looking for.  To the best of my knowledge that covers what is available now
/ what is being worked on.

Imran

On Wed, Mar 11, 2015 at 4:38 PM, Jonathan Coveney <jcoveney@gmail.com>
wrote:

> Hello all,
>
> I am wondering if spark already has support for optimizations on sorted
> data and/or if such support could be added (I am comfortable dropping to a
> lower level if necessary to implement this, but I'm not sure if it is
> possible at all).
>
> Context: we have a number of data sets which are essentially already
> sorted on a key. With our current systems, we can take advantage of this to
> do a lot of analysis in a very efficient fashion...merges and joins, for
> example, can be done very efficiently, as can folds on a secondary key and
> so on.
>
> I was wondering if spark would be a fit for implementing these sorts of
> optimizations? Obviously it is sort of a niche case, but would this be
> achievable? Any pointers on where I should look?
>

Mime
View raw message