spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongjoon Hyun <dongjoon.h...@gmail.com>
Subject Re: A common naming policy for third-party packages/modules under org.apache.spark?
Date Tue, 22 Sep 2020 02:58:29 GMT
Hi, Steve.

Sure, you can suggest, but I'm wondering how the suggested namespaces are
able to satisfy the existing visibility rules. Could you give us some
examples specifically?

> Can I suggest some common prefix for third-party-classes put into the
spark package tree, just to make clear that they are external contributions?

Bests,
Dongjoon.


On Mon, Sep 21, 2020 at 6:29 AM Steve Loughran <stevel@cloudera.com.invalid>
wrote:

>
> I've just been stack-trace-chasing the 404-in-task-commit code:
>
> https://issues.apache.org/jira/browse/HADOOP-17216
>
> And although it's got an org.apache.spark. prefix, it's
> actually org.apache.spark.sql.delta, which lives in github, so the
> code/issue tracker lives elsewhere.
>
> I understand why they've done this -I've done it myself- it's to get a
> classes package-scoped to spark (
> https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/org/apache/spark/cloudera/ParallelizedWithLocalityRDD.scala
> )
>
> however, it can be confusing and time wasting
>
> Can I suggest some common prefix for third-party-classes put into the
> spark package tree, just to make clear that they are external
> contributions? It will set expectations up all round
>
> -Steve
>
> (*) Side node: Could whoever maintains that code do retries, which have to
> have sleeps of >10-15s? We ended up having to do exponental backoff of >
> 90s to make sure the load balancers were clean. The time for a 404 to clear
> is not "time since file was added", it is "time since last HEAD/GET/COPY
> request". thx
>

Mime
View raw message