Hi Jem,Linear time in scaling on the big table doesn't seem that surprising to me. What were you expecting?I assume you're doing normalRDD.join(indexedRDD). If you were to replace the indexedRDD with a normal RDD, what times do you get?On Tue, Jan 13, 2015 at 5:35 AM, Jem Tucker <email@example.com> wrote:Hi,I have been playing around with the indexedRDD (https://issues.apache.org/jira/browse/SPARK-2365, https://github.com/amplab/spark-indexedrdd) and have been very impressed with its performance. Some performance testing has revealed worse than expected scaling of the join performance*, and I was just wondering if anyone else has any experience using it and what they have found?Thanks,Jem*Table below shows some of my results when joining a small RDD to a large IndexedRDD. Each table consisted of a Long key and 15 character String value. Shows an almost linear time increase with the number of rows in the bigger table.
Small Table Rows
Big Table Rows