spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chagas <cha...@gta.ufrj.br>
Subject Incremental (online) machine learning algorithms on ML
Date Mon, 05 Aug 2019 12:56:05 GMT
Hi,

After searching the machine learning library for streaming algorithms, I 
found two that fit the criteria: Streaming Linear Regression 
(https://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression)

and Streaming K-Means 
(https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means).

However, both use the RDD-based API MLlib instead of the DataFrame-based 
API ML; are there any plans for bringing them both to ML?

Also, is there any technical reason why there are so few incremental 
algorithms on the machine learning library? There's only 1 algorithm for 
regression and clustering each, with nothing for classification, 
dimensionality reduction or feature extraction.

If there is a reason, how were those two algorithms implemented? If 
there isn't, what is the general consensus on adding new online machine 
learning algorithms?

Regards,
Lucas Chagas

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message