spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jungtaek Lim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21429) show on structured Dataset is equivalent to writeStream to console once
Date Thu, 03 May 2018 09:59:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462225#comment-16462225
] 

Jungtaek Lim commented on SPARK-21429:
--------------------------------------

I agree that shortcut would help, but a bit afraid that such shortcut might hide the detail,
difference between executing batch and streaming. Unless source data has changed, running
batch will not have side effect. But running streaming will change source offset. (I guess
it is true even for Trigger.once() but please correct me if I'm missing something.)

> show on structured Dataset is equivalent to writeStream to console once
> -----------------------------------------------------------------------
>
>                 Key: SPARK-21429
>                 URL: https://issues.apache.org/jira/browse/SPARK-21429
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>            Reporter: Jacek Laskowski
>            Priority: Minor
>
> While working with Datasets it's often helpful to do {{show}}. It does not work for streaming
Datasets (and leads to {{AnalysisException}} - see below), but think it could just be the
following under the covers and very helpful (would cut plenty of keystrokes for sure).
> {code}
> val sq = ...
> scala> sq.isStreaming
> res0: Boolean = true
> import org.apache.spark.sql.streaming.Trigger
> scala> sq.writeStream.format("console").trigger(Trigger.Once).start
> {code}
> Since {{show}} returns {{Unit}} that could just work.
> Currently {{show}} reports {{AnalysisException}}.
> {code}
> scala> sq.show
> org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed
with writeStream.start();;
> rate
>   at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:297)
>   at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:36)
>   at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$$anonfun$checkForBatch$1.apply(UnsupportedOperationChecker.scala:34)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
>   at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:126)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
>   at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForBatch(UnsupportedOperationChecker.scala:34)
>   at org.apache.spark.sql.execution.QueryExecution.assertSupported(QueryExecution.scala:63)
>   at org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:74)
>   at org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72)
>   at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78)
>   at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78)
>   at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
>   at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
>   at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
>   at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
>   at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3027)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:2340)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2553)
>   at org.apache.spark.sql.Dataset.showString(Dataset.scala:241)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:671)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:630)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:639)
>   ... 50 elided
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message