Shixiong Zhu created SPARK-4824:
-----------------------------------
Summary: Join should use `Iterator` rather than `Iterable`
Key: SPARK-4824
URL: https://issues.apache.org/jira/browse/SPARK-4824
Project: Spark
Issue Type: Bug
Components: Spark Core
Reporter: Shixiong Zhu
In Scala, `map` and `flatMap` of `Iterable` will copy the contents of `Iterable` to a new
`Seq`. Such as,
{code}
val iterable = Seq(1, 2, 3).map(v => {
println(v)
v
})
println("Iterable map done")
val iterator = Seq(1, 2, 3).iterator.map(v => {
println(v)
v
})
println("Iterator map done")
{code}
outputed
{code}
1
2
3
Iterable map done
Iterator map done
{code}
So we should use 'iterator' to reduce memory consumed by join.
Found by [~johannes.simon]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org
|