From user-return-28571-apmail-spark-user-archive=spark.apache.org@spark.apache.org Thu Mar 12 15:07:24 2015 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CDEFF17D42 for ; Thu, 12 Mar 2015 15:07:24 +0000 (UTC) Received: (qmail 50311 invoked by uid 500); 12 Mar 2015 15:07:18 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 50242 invoked by uid 500); 12 Mar 2015 15:07:18 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 50232 invoked by uid 99); 12 Mar 2015 15:07:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Mar 2015 15:07:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shahab.mokari@gmail.com designates 74.125.82.177 as permitted sender) Received: from [74.125.82.177] (HELO mail-we0-f177.google.com) (74.125.82.177) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Mar 2015 15:06:53 +0000 Received: by wesw55 with SMTP id w55so17035337wes.3 for ; Thu, 12 Mar 2015 08:04:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=+RxCCfw4FjYrVWhxb+b13B8ZjUV52IGpuL2zUqSd+uI=; b=BNxk2J4BtYYcRAns2k6LAksCEWdXIKPZfM26G88s51Nx2q2wikTVkg3mri5J9IKOU4 wQlILPHwdRhIsnEGpuusWR7ajX8lF8qUZwjsnvTc8991jyGG4/ebG7aJqRDdOWHmmK00 dBi+vIlO6doIhcDLbYxg8m1L/sEHqixYusf8kGVfcf3Y0+b4LdvnWEuAEDSBOiVc4+t8 UOdjPB3ylTC7G4dxU1kUW7sXCPmKbBCy+obcqyfcAWcPOYS+jL3nYRBTje5W/754yzzg +EFQZfZCT1n32LbKxHDWKSj83LD2Qgw6ga/p749sO+7FLbJwljiSx532B7MVYIJj5Y86 BLvg== MIME-Version: 1.0 X-Received: by 10.180.108.81 with SMTP id hi17mr31979260wib.91.1426172677001; Thu, 12 Mar 2015 08:04:37 -0700 (PDT) Received: by 10.27.179.231 with HTTP; Thu, 12 Mar 2015 08:04:36 -0700 (PDT) Date: Thu, 12 Mar 2015 16:04:36 +0100 Message-ID: Subject: Which is more efficient : first join three RDDs and then do filtering or vice versa? From: shahab To: "user@spark.apache.org" Content-Type: multipart/alternative; boundary=e89a8f3bb02bcd1f0f051118b3d3 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f3bb02bcd1f0f051118b3d3 Content-Type: text/plain; charset=UTF-8 Hi, Probably this question is already answered sometime in the mailing list, but i couldn't find it. Sorry for posting this again. I need to to join and apply filtering on three different RDDs, I just wonder which of the following alternatives are more efficient: 1- first joint all three RDDs and then do filtering on resulting joint RDD or 2- Apply filtering on each individual RDD and then join the resulting RDDs Or probably there is no difference due to lazy evaluation and under beneath Spark optimisation? best, /Shahab --e89a8f3bb02bcd1f0f051118b3d3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

Probably this question is already a= nswered sometime in the mailing list, but i couldn't find it. Sorry for= posting this again.

I need to to join and apply f= iltering on three different RDDs, I just wonder which of the following alte= rnatives are more efficient:
1- first joint all three RDDs and th= en do =C2=A0filtering on resulting joint RDD =C2=A0 or
2- Apply f= iltering on each individual RDD and then join the resulting RDDs
=

Or probably there is no difference due to laz= y evaluation and under beneath Spark optimisation?

best,
/Shahab
--e89a8f3bb02bcd1f0f051118b3d3--