spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: number of partitions for hive schemaRDD
Date Thu, 26 Feb 2015 10:13:52 GMT
Hi Masaki,

I guess what you saw is the partition number of the last stage, which 
must be 1 to perform the global phase of LIMIT. To tune partition number 
of normal shuffles like joins, you may resort to 
spark.sql.shuffle.partitions.

Cheng

On 2/26/15 5:31 PM, masaki rikitoku wrote:
> Hi all
>
> now, I'm trying the SparkSQL with hivecontext.
>
> when I execute the hql like the following.
>
> ---
>
> val ctx = new org.apache.spark.sql.hive.HiveContext(sc)
> import ctx._
>
> val queries = ctx.hql("select keyword from queries where dt =
> '2015-02-01' limit 10000000")
>
> ---
>
> It seem that the number of the partitions ot the queries is set by 1.
>
> Is this the specifications for schemaRDD, SparkSQL, HiveContext ?
>
> Are there any means to set the number of partitions arbitrary value
> except for explicit repartition
>
>
> Masaki Rikitoku
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message