spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cane <zhoukang199...@gmail.com>
Subject saveAsNewAPIHadoopDataset must not enable speculation for parquet file?
Date Tue, 03 Apr 2018 10:19:40 GMT
Now, if we use saveAsNewAPIHadoopDataset with speculation enable.It may cause
data loss.
I check the comment of thi api:

  We should make sure our tasks are idempotent when speculation is enabled,
i.e. do
   * not use output committer that writes data directly.
   * There is an example in
https://issues.apache.org/jira/browse/SPARK-10063 to show the bad
   * result of using direct output committer with speculation enabled.
   */

But if this the rule we must follow?
For example,for parquet it will got ParquetOutPutCommitter.
In this case, speculation must disable for parquet?

Is there some one know the history?
Thanks too much!




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message