Row is a generic ordered collection of fields that most likely contain a Schema of StructType. You need to keep track of the datatypes of the fields yourself.

If you want compile time safety of datatypes (and intellisense support) you need to use RDD:s or the Dataset[T] api. Dataset[T] might incur overhead and break partition filtering pushdown etc. if you don't take care but it will give you compile time errors. You still need to make sure the real underlying data types conform to the schema when you cast the Dataframe though. There's no Dataset api for Python though.

https://spark.apache.org/docs/2.4.2/api/java/org/apache/spark/sql/Row.html  
 
Basically you need to check the schema of your input and treat you columns accordingly.

DataType reference.
http://spark.apache.org/docs/latest/sql-reference.html  


On Sun, Jun 23, 2019 at 11:15 AM RanXin <ranxin96@163.com> wrote:
I use spark 2.4.3, python to build a structured streaming. May I know the
data type of the parameter "row" in process_row function? The following
codes is how the official programming guide instruct us to deal with foreach
function:
def process_row(row):
          # Write row to storage
          pass

query = streamingDF.writeStream.foreach(process_row).start()

Thanks a lot.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org