spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Jin <>
Subject Checkpoint and recomputation
Date Fri, 03 Jan 2020 22:13:49 GMT
Hi dear devs,

I recently came across checkpoint functionality in Spark and found (a
little surprising) that checkpoint causes the DataFrame to be computed
twice unless cache is called before checkpoint.

My guess is that this is probably hard to fix and/or maybe checkpoint
feature is not very frequently used?

View raw message