spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From freedafeng <freedaf...@yahoo.com>
Subject What could cause number of tasks to go down from 2k to 1?
Date Fri, 30 Jan 2015 00:25:20 GMT
Hi, 

The input data has 2048 partitions. The final step is to load the processed
data into hbase through saveAsNewAPIHadoopDataset(). Every step except the
last one ran in parallel in the cluster. But the last step only has 1 task
which runs on only 1 node using one core. 

Spark 1.1.1. + CDH5.3.0. 
	
Probably I should set the numPartitions in reduceByKey call to some big
number? I did not set this parameter in the current codes. This reduceByKey
call is the one that runs before the saveAsNewAPIHaddopDataset() call. 

Any idea? Thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-could-cause-number-of-tasks-to-go-down-from-2k-to-1-tp21430.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message