spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vaquar khan <vaquar.k...@gmail.com>
Subject Re: Apache Spark - MLLib challenges
Date Sat, 23 Sep 2017 23:36:01 GMT
MLIB is old RDD-based API  since  Apache Spark 2 is recommended to use
dataset based APIs to get good performance  and introduce ML.

ML contains new API build around Dataset and ML Pipelines ,mllib is slowly
being deprecated (this already happened in case of linear regression)
MLIB currently entered into maintenance mode.


Regards,
Vaquar khan

On Sat, Sep 23, 2017 at 4:04 PM, Koert Kuipers <koert@tresata.com> wrote:

> our main challenge has been the lack of support for missing values
> generally
>
> On Sat, Sep 23, 2017 at 3:41 AM, Irfan Kabli <irfan.kabli786@gmail.com>
> wrote:
>
>> Dear All,
>>
>> We are looking to position MLLib in our organisation for machine learning
>> tasks and are keen to understand if their are any challenges that you might
>> have seen with MLLib in production. We will be going with the pure
>> open-source approach here, rather than using one of the hadoop
>> distributions out their in the market.
>>
>> Furthemore, with a multi-tenant hadoop cluster, and data in memory, would
>> spark support encrypting the data in memory with DataFrames.
>>
>> --
>> Best Regards,
>> Irfan Kabli
>>
>>
>


-- 
Regards,
Vaquar Khan
+1 -224-436-0783
Greater Chicago

Mime
View raw message