spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From M Singh <mans2si...@yahoo.com.INVALID>
Subject Re: Apache Spark - Custom structured streaming data source
Date Fri, 26 Jan 2018 15:10:11 GMT
Thanks TD.  When will 2.3 scheduled for release ?   

    On Thursday, January 25, 2018 11:32 PM, Tathagata Das <tdas@databricks.com> wrote:
 

 Hello Mans,
The streaming DataSource APIs are still evolving and are not public yet. Hence there is no
official documentation. In fact, there is a new DataSourceV2 API (in Spark 2.3) that we are
migrating towards. So at this point of time, it's hard to make any concrete suggestion. You
can take a look at the classes DataSourceV2, DataReader, MicroBatchDataReader in the spark
source code, along with their implementations.
Hope this helps. 
TD

On Jan 25, 2018 8:36 PM, "M Singh" <mans2singh@yahoo.com.invalid> wrote:

Hi:
I am trying to create a custom structured streaming source and would like to know if there
is any example or documentation on the steps involved.
I've looked at the some methods available in the SparkSession but these are internal to the
sql package:
  private[sql] def internalCreateDataFrame(      catalystRows: RDD[InternalRow],   
  schema: StructType,      isStreaming: Boolean = false): DataFrame = {    // TODO:
use MutableProjection when rowRDD is another DataFrame and the applied    // schema differs
from the existing schema on any field data type.    val logicalPlan = LogicalRDD(     
schema.toAttributes,      catalystRows,      isStreaming = isStreaming)(self)    Dataset.ofRows(self,
logicalPlan)  } 
Please let me know where I can find the appropriate API or documentation.
Thanks
Mans



   
Mime
View raw message