spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject [SS] Why does ConsoleSink's addBatch convert input DataFrame to show it?
Date Fri, 07 Jul 2017 08:43:16 GMT
Hi,

Just noticed that the input DataFrame is collect'ed and then
parallelize'd simply to show it to the console [1]. Why so many fairly
expensive operations for show?

I'd appreciate some help understanding this code. Thanks.

[1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala#L51-L53

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message