spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Jung <itsjb.j...@samsung.com>
Subject Stucked job work well after rdd.count or rdd.collect
Date Mon, 06 Oct 2014 02:54:01 GMT
Hi, all.
I'm in an unusual situation.
The code,

...
1: val cell = dataSet.flatMap(parse(_)).cache
2: val distinctCell = cell.keyBy(_._1).reduceByKey(removeDuplication(_,
_)).mapValues(_._3).cache
3: val groupedCellByLine =
distinctCell.map(cellToIterableColumn).groupByKey.cache
4: val result = (1 to groupedCellByLine.map(_._2.size).max).toArray
...

get stuck when the line 4 is executed.
But if I add 'cell.collect' or 'cell.count' between line 3 and line 4, it
works fine.
I don't know why it happens.
Does anyone have experience like this?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Stucked-job-work-well-after-rdd-count-or-rdd-collect-tp15776.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message