spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Coveney <>
Subject can spark take advantage of ordered data?
Date Wed, 11 Mar 2015 21:38:04 GMT
Hello all,

I am wondering if spark already has support for optimizations on sorted
data and/or if such support could be added (I am comfortable dropping to a
lower level if necessary to implement this, but I'm not sure if it is
possible at all).

Context: we have a number of data sets which are essentially already sorted
on a key. With our current systems, we can take advantage of this to do a
lot of analysis in a very efficient fashion...merges and joins, for
example, can be done very efficiently, as can folds on a secondary key and
so on.

I was wondering if spark would be a fit for implementing these sorts of
optimizations? Obviously it is sort of a niche case, but would this be
achievable? Any pointers on where I should look?

View raw message