MLIB is old RDD-based API  since  Apache Spark 2 is recommended to use dataset based APIs to get good performance  and introduce ML.
ML contains new API build around Dataset and ML Pipelines ,mllib is slowly being deprecated (this already happened in case of linear regression) MLIB currently entered into maintenance mode.

Vaquar khan

On Sat, Sep 23, 2017 at 4:04 PM, Koert Kuipers <> wrote:
our main challenge has been the lack of support for missing values generally

On Sat, Sep 23, 2017 at 3:41 AM, Irfan Kabli <> wrote:
Dear All,

We are looking to position MLLib in our organisation for machine learning tasks and are keen to understand if their are any challenges that you might have seen with MLLib in production. We will be going with the pure open-source approach here, rather than using one of the hadoop distributions out their in the market.

Furthemore, with a multi-tenant hadoop cluster, and data in memory, would spark support encrypting the data in memory with DataFrames. 

Best Regards,
Irfan Kabli

Vaquar Khan
+1 -224-436-0783
Greater Chicago