spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thakrar, Jayesh" <jthak...@conversantmedia.com>
Subject Spark 2.3 V2 Datasource API questions
Date Fri, 06 Apr 2018 15:29:30 GMT
First of all thank you to the Spark dev team for coming up with the standardized and intuitive
API interfaces.
I am sure it will encourage integrating a lot more new datasource integration.

I have been creating playing with the API and have some questions on the continuous streaming
API
(see https://github.com/JThakrar/sparkconn#continuous-streaming-datasource )

It seems that "commit" is never called

query.status always shows the message below even after the query has been initialized, data
has been streaming:
{
  "message" : "Initializing sources",
  "isDataAvailable" : false,
  "isTriggerActive" : true
}


query.recentProgress always shows an empty array:

Array[org.apache.spark.sql.streaming.StreamingQueryProgress] = Array()

And stopping a query always shows as if the tasks were lost involuntarily or uncleanly (even
though close on the datasource was called) :
2018-04-06 08:07:10 WARN  TaskSetManager:66 - Lost task 2.0 in stage 1.0 (TID 7, localhost,
executor driver): TaskKilled (Stage cancelled)
2018-04-06 08:07:10 WARN  TaskSetManager:66 - Lost task 1.0 in stage 1.0 (TID 6, localhost,
executor driver): TaskKilled (Stage cancelled)
2018-04-06 08:07:10 WARN  TaskSetManager:66 - Lost task 3.0 in stage 1.0 (TID 8, localhost,
executor driver): TaskKilled (Stage cancelled)
2018-04-06 08:07:10 WARN  TaskSetManager:66 - Lost task 0.0 in stage 1.0 (TID 5, localhost,
executor driver): TaskKilled (Stage cancelled)
2018-04-06 08:07:10 WARN  TaskSetManager:66 - Lost task 4.0 in stage 1.0 (TID 9, localhost,
executor driver): TaskKilled (Stage cancelled)

Any pointers/info will be greatly appreciated.



Mime
View raw message