spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <nick.pentre...@gmail.com>
Subject Re: RDD MLLib Deprecation Question
Date Tue, 30 May 2017 13:41:32 GMT
The short answer is those distributed linalg parts will not go away.

In the medium term, it's much less likely that the distributed matrix
classes will be ported over to DataFrames (though the ideal would be to
have DataFrame-backed distributed matrix classes) - given the time and
effort it's taken just to port the various ML models and feature
transformers over to ML.

The current distributed matrices use the old mllib linear algebra
primitives for backing datastructures and ops, so those will have to be
ported at some point to the ml package vectors & matrices, though overall
functionality would remain the same initially I would expect.

There is https://issues.apache.org/jira/browse/SPARK-15882 that discusses
some of the ideas. The decision would still need to be made on the
higher-level API (whether it remains the same is current, or changes to be
DF-based, and/or changed in other ways, etc)

On Tue, 30 May 2017 at 15:33 John Compitello <johnc@broadinstitute.org>
wrote:

> Hey all,
>
> I see on the MLLib website that there are plans to deprecate the RDD based
> API for MLLib once the new ML API reaches feature parity with RDD based
> one. Are there currently plans to reimplement all the distributed linear
> algebra / matrices operations as part of this new API, or are these things
> just going away? Like, will there still be a BlockMatrix class for
> distributed multiplies?
>
> Best,
>
> John
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message