spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suzen, Mehmet" <su...@acm.org>
Subject Re: Training A ML Model on a Huge Dataframe
Date Wed, 23 Aug 2017 12:59:33 GMT
It depends on what model you would like to train but models requiring
optimisation could use SGD with mini batches. See:
https://spark.apache.org/docs/latest/mllib-optimization.html#stochastic-gradient-descent-sgd

On 23 August 2017 at 14:27, Sea aj <saj3saj@gmail.com> wrote:

> Hi,
>
> I am trying to feed a huge dataframe to a ml algorithm in Spark but it
> crashes due to the shortage of memory.
>
> Is there a way to train the model on a subset of the data in multiple
> steps?
>
> Thanks
>
>
>
> <https://mailtrack.io/> Sent with Mailtrack
> <https://mailtrack.io/install?source=signature&lang=en&referral=saj3saj@gmail.com&idSignature=22>
>



-- 

Mehmet Süzen, MSc, PhD
<suzen@acm.org>

| PRIVILEGED AND CONFIDENTIAL COMMUNICATION This e-mail transmission, and
any documents, files or previous e-mail messages attached to it, may
contain confidential information that is legally privileged. If you are not
the intended recipient or a person responsible for delivering it to the
intended recipient, you are hereby notified that any disclosure, copying,
distribution or use of any of the information contained in or attached to
this transmission is STRICTLY PROHIBITED within the applicable law. If you
have received this transmission in error, please: (1) immediately notify me
by reply e-mail to suzen@acm.org,  and (2) destroy the original
transmission and its attachments without reading or saving in any manner. |

Mime
View raw message