From user-return-48520-apmail-spark-user-archive=spark.apache.org@spark.apache.org Wed Dec 23 13:37:59 2015 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 11C1B181B6 for ; Wed, 23 Dec 2015 13:37:59 +0000 (UTC) Received: (qmail 76898 invoked by uid 500); 23 Dec 2015 13:37:55 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 76788 invoked by uid 500); 23 Dec 2015 13:37:55 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 76774 invoked by uid 99); 23 Dec 2015 13:37:55 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Dec 2015 13:37:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id BF1941804C6 for ; Wed, 23 Dec 2015 13:37:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.286 X-Spam-Level: ** X-Spam-Status: No, score=2.286 tagged_above=-999 required=6.31 tests=[SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001, URI_HEX=1.313] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id RnAu8X69Av9L for ; Wed, 23 Dec 2015 13:37:45 +0000 (UTC) Received: from mwork.nabble.com (mwork.nabble.com [162.253.133.43]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTP id 7FBAE429C1 for ; Wed, 23 Dec 2015 13:37:45 +0000 (UTC) Received: from mben.nabble.com (unknown [162.253.133.72]) by mwork.nabble.com (Postfix) with ESMTP id DDAC36F534B1 for ; Wed, 23 Dec 2015 05:35:52 -0800 (PST) Date: Wed, 23 Dec 2015 06:37:44 -0700 (MST) From: tiandiwoxin1234 To: user@spark.apache.org Message-ID: <1450877864394-25789.post@n3.nabble.com> Subject: Problem using limit clause in spark sql MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, I am using spark sql in a way like this: sqlContext.sql(=E2=80=9Cselect * from table limit 10000=E2=80=9D).map(...).= collect() The problem is that the limit clause will collect all the 10,000 records into a single partition, resulting the map afterwards running only in one partition and being really slow.I tried to use repartition, but it is kind of a waste to collect all those records into one partition and then shuffle them around and then collect them again. Is there a way to work around this?=20 BTW, there is no order by clause and I do not care which 10000 records I ge= t as long as the total number is less or equal then 10000. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabb= le.com/Problem-using-limit-clause-in-spark-sql-tp25789.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org