From user-return-28506-apmail-spark-user-archive=spark.apache.org@spark.apache.org Wed Mar 11 21:38:47 2015 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 11E9917D88 for ; Wed, 11 Mar 2015 21:38:47 +0000 (UTC) Received: (qmail 16250 invoked by uid 500); 11 Mar 2015 21:38:39 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 16162 invoked by uid 500); 11 Mar 2015 21:38:39 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 16149 invoked by uid 99); 11 Mar 2015 21:38:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Mar 2015 21:38:39 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jcoveney@gmail.com designates 209.85.223.180 as permitted sender) Received: from [209.85.223.180] (HELO mail-ie0-f180.google.com) (209.85.223.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Mar 2015 21:38:34 +0000 Received: by iecvj10 with SMTP id vj10so1014623iec.0 for ; Wed, 11 Mar 2015 14:38:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=4+A2A1rzDfOOpXgSAtRE41S9VRvHTqqRCDYFRjRnY34=; b=N1lOzQZTOJFM/EV7ms0fLZW5n7wSgX9sSMDvRBwMJ4hys4eJlwFseO3SPfHnLYnWUI PHWXm9CYA56Vlyl6acZ34nsfAcDeUbytgve4AhgzLUroB3Vg/rua5VPcjlNhaI3fxi5J dEO64/Khcby4RjsRY5PcYL4dxswd3mwP95eTGTTxCssB6uTR4TAHIjGxlJiG87r+1LCB 2nhpihu/rsggbNjt7X2cIxfpSSiMmXHSODZGO47GoQWf6nV7TdQKT8VUojyyZMdg8U3x k3LkrUfa6/WYPE7ypVwRnBSBiONl1CLDbt0ieTDkHPJguh211TzPtH6bEH884Rz2erFS JI4g== MIME-Version: 1.0 X-Received: by 10.42.119.202 with SMTP id c10mr37590139icr.4.1426109884548; Wed, 11 Mar 2015 14:38:04 -0700 (PDT) Received: by 10.64.32.229 with HTTP; Wed, 11 Mar 2015 14:38:04 -0700 (PDT) Date: Wed, 11 Mar 2015 17:38:04 -0400 Message-ID: Subject: can spark take advantage of ordered data? From: Jonathan Coveney To: User@spark.apache.org Content-Type: multipart/alternative; boundary=90e6ba61465a144fa505110a150b X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba61465a144fa505110a150b Content-Type: text/plain; charset=UTF-8 Hello all, I am wondering if spark already has support for optimizations on sorted data and/or if such support could be added (I am comfortable dropping to a lower level if necessary to implement this, but I'm not sure if it is possible at all). Context: we have a number of data sets which are essentially already sorted on a key. With our current systems, we can take advantage of this to do a lot of analysis in a very efficient fashion...merges and joins, for example, can be done very efficiently, as can folds on a secondary key and so on. I was wondering if spark would be a fit for implementing these sorts of optimizations? Obviously it is sort of a niche case, but would this be achievable? Any pointers on where I should look? --90e6ba61465a144fa505110a150b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello all,
I am wondering if spark already has support for optimizations on = sorted data and/or if such support could be added (I am comfortable droppin= g to a lower level if necessary to implement this, but I'm not sure if = it is possible at all).

Context: we have a number of d= ata sets which are essentially already sorted on a key. With our current sy= stems, we can take advantage of this to do a lot of analysis in a very effi= cient fashion...merges and joins, for example, can be done very efficiently= , as can folds on a secondary key and so on.
=
I was won= dering if spark would be a fit for implementing these sorts of optimization= s? Obviously it is sort of a niche case, but would this be achievable? Any = pointers on where I should look?
--90e6ba61465a144fa505110a150b--