spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From drewrobb <drewr...@gmail.com>
Subject _SUCCESS file validation on read
Date Mon, 03 Apr 2017 20:58:21 GMT
When writing a dataframe, a _SUCCESS file is created to mark that the entire
dataframe is written. However, the existence of this _SUCCESS does not seem
to be validated by default on reads. This would allow in some cases for
partially written dataframes to be read back. Is this behavior configurable?
Is lack of validation intentional?

Thanks!

Here is an example from spark 2.1.0 shell. I would expect the read step to
fail because I've manually removed the _SUCCESS file:

scala> spark.range(10).write.save("/tmp/test")

$ rm /tmp/test/_SUCCESS

scala> spark.read.parquet("/tmp/test").show()
+---+
| id|
+---+
|  8|
|  9|
|  3|
|  4|
|  5|
|  0|
|  6|
|  7|
|  2|
|  1|
+---+



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SUCCESS-file-validation-on-read-tp28564.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message