spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shouheng Yi <sho...@microsoft.com.INVALID>
Subject [Spark Namespace]: Expanding Spark ML under Different Namespace?
Date Wed, 22 Feb 2017 20:51:19 GMT
Hi Spark developers,

Currently my team at Microsoft is extending Spark's machine learning functionalities to include
new learners and transformers. We would like users to use these within spark pipelines so
that they can mix and match with existing Spark learners/transformers, and overall have a
native spark experience. We cannot accomplish this using a non-"org.apache" namespace with
the current implementation, and we don't want to release code inside the apache namespace
because it's confusing and there could be naming rights issues.

We need to extend several classes from spark which happen to have "private[spark]." For example,
one of our class extends VectorUDT[0] which has private[spark] class VectorUDT as its access
modifier. This unfortunately put us in a strange scenario that forces us to work under the
namespace org.apache.spark.

To be specific, currently the private classes/traits we need to use to create new Spark learners
& Transformers are HasInputCol, VectorUDT and Logging. We will expand this list as we
develop more.

Is there a way to avoid this namespace issue? What do other people/companies do in this scenario?
Thank you for your help!

[0]: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/linalg/VectorUDT.scala

Best,
Shouheng


Mime
View raw message