spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From freedafeng <>
Subject What could cause number of tasks to go down from 2k to 1?
Date Fri, 30 Jan 2015 00:25:20 GMT

The input data has 2048 partitions. The final step is to load the processed
data into hbase through saveAsNewAPIHadoopDataset(). Every step except the
last one ran in parallel in the cluster. But the last step only has 1 task
which runs on only 1 node using one core. 

Spark 1.1.1. + CDH5.3.0. 
Probably I should set the numPartitions in reduceByKey call to some big
number? I did not set this parameter in the current codes. This reduceByKey
call is the one that runs before the saveAsNewAPIHaddopDataset() call. 

Any idea? Thanks!

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message