spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Nastetsky <alex.nastet...@vervemobile.com>
Subject dataframe json schema scan
Date Thu, 20 Aug 2015 19:35:35 GMT
The doc for DataFrameReader#json(RDD[String]) method says

"Unless the schema is specified using schema function, this function goes
through the input once to determine the input schema."

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader

Why is this necessary? Why can't it create the dataframe at the same time
as it's determining the schema?

Thanks.

Mime
View raw message