spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Nastetsky <>
Subject dataframe json schema scan
Date Thu, 20 Aug 2015 19:35:35 GMT
The doc for DataFrameReader#json(RDD[String]) method says

"Unless the schema is specified using schema function, this function goes
through the input once to determine the input schema."

Why is this necessary? Why can't it create the dataframe at the same time
as it's determining the schema?


View raw message