spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From java8964 <>
Subject RE: can spark take advantage of ordered data?
Date Thu, 12 Mar 2015 00:39:49 GMT
At least for join, you can implement your own partitioner, to utilize the sorted data.
Just my 2 cents.
Date: Wed, 11 Mar 2015 17:38:04 -0400
Subject: can spark take advantage of ordered data?

Hello all,
I am wondering if spark already has support for optimizations on sorted data and/or if such
support could be added (I am comfortable dropping to a lower level if necessary to implement
this, but I'm not sure if it is possible at all).
Context: we have a number of data sets which are essentially already sorted on a key. With
our current systems, we can take advantage of this to do a lot of analysis in a very efficient
fashion...merges and joins, for example, can be done very efficiently, as can folds on a secondary
key and so on.
I was wondering if spark would be a fit for implementing these sorts of optimizations? Obviously
it is sort of a niche case, but would this be achievable? Any pointers on where I should look?
View raw message