spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tiandiwoxin1234 <tiandiwo...@icloud.com>
Subject Problem using limit clause in spark sql
Date Wed, 23 Dec 2015 13:37:44 GMT
Hi,
I am using spark sql in a way like this:

sqlContext.sql(“select * from table limit 10000”).map(...).collect()

The problem is that the limit clause will collect all the 10,000 records
into a single partition, resulting the map afterwards running only in one
partition and being really slow.I tried to use repartition, but it is kind
of a waste to collect all those records into one partition and then shuffle
them around and then collect them again.

Is there a way to work around this? 
BTW, there is no order by clause and I do not care which 10000 records I get
as long as the total number is less or equal then 10000.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Problem-using-limit-clause-in-spark-sql-tp25789.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message