I've just been stack-trace-chasing the 404-in-task-commit code:
And although it's got an org.apache.spark. prefix, it's actually org.apache.spark.sql.delta, which lives in github, so the code/issue tracker lives elsewhere.
however, it can be confusing and time wasting
Can I suggest some common prefix for third-party-classes put into the spark package tree, just to make clear that they are external contributions? It will set expectations up all round
(*) Side node: Could whoever maintains that code do retries, which have to have sleeps of >10-15s? We ended up having to do exponental backoff of > 90s to make sure the load balancers were clean. The time for a 404 to clear is not "time since file was added", it is "time since last HEAD/GET/COPY request". thx