spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sela, Amit" <ANS...@paypal.com.INVALID>
Subject Re: spark 2.0 readStream from a REST API
Date Thu, 11 Aug 2016 07:39:18 GMT
The current available output modes are Complete and Append. Complete mode is for stateful processing
(aggregations), and Append mode for stateless processing (I.e., map/filter). See : http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes
Dataset#writeStream will produce a DataStreamWriter which allows you to start a query. This
seems consistent with Spark’s previous behaviour of only executing upon an “action”,
and the queries I guess are what “jobs” used to be.


Thanks,
Amit

From: Ayoub Benali <benali.ayoub.info@gmail.com<mailto:benali.ayoub.info@gmail.com>>
Date: Tuesday, August 2, 2016 at 11:59 AM
To: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Cc: Jacek Laskowski <jacek@japila.pl<mailto:jacek@japila.pl>>, Amit Sela <amitsela33@gmail.com<mailto:amitsela33@gmail.com>>,
Michael Armbrust <michael@databricks.com<mailto:michael@databricks.com>>
Subject: Re: spark 2.0 readStream from a REST API

Why writeStream is needed to consume the data ?

When I tried it I got this exception:

INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
org.apache.spark.sql.AnalysisException: Complete output mode not supported when there are
no streaming aggregations on streaming DataFrames/Datasets;
at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:173)
at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.checkForStreaming(UnsupportedOperationChecker.scala:65)
at org.apache.spark.sql.streaming.StreamingQueryManager.startQuery(StreamingQueryManager.scala:236)
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:287)
at .<init>(<console>:59)



2016-08-01 18:44 GMT+02:00 Amit Sela <amitsela33@gmail.com<mailto:amitsela33@gmail.com>>:
I think you're missing:

valquery=wordCounts.writeStream

  .outputMode("complete")
  .format("console")
  .start()

Dis it help ?

On Mon, Aug 1, 2016 at 2:44 PM Jacek Laskowski <jacek@japila.pl<mailto:jacek@japila.pl>>
wrote:
On Mon, Aug 1, 2016 at 11:01 AM, Ayoub Benali
<benali.ayoub.info@gmail.com<mailto:benali.ayoub.info@gmail.com>> wrote:

> the problem now is that when I consume the dataframe for example with count
> I get the stack trace below.

Mind sharing the entire pipeline?

> I followed the implementation of TextSocketSourceProvider to implement my
> data source and Text Socket source is used in the official documentation
> here.

Right. Completely forgot about the provider. Thanks for reminding me about it!

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org<mailto:user-unsubscribe@spark.apache.org>


Mime
View raw message