spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suzen, Mehmet" <su...@acm.org>
Subject Re: Training A ML Model on a Huge Dataframe
Date Wed, 23 Aug 2017 23:07:37 GMT
SGD is supported. I see I assumed you were using Scala. Looks like you can
do streaming regression, not sure of pyspark API though:

https://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression

On 23 August 2017 at 18:22, Sea aj <saj3saj@gmail.com> wrote:

> Thanks for the reply.
>
> As far as I understood mini batch is not yet supported in ML libarary. As
> for MLLib minibatch, I could not find any pyspark api.
>
>
>
> <https://mailtrack.io/> Sent with Mailtrack
> <https://mailtrack.io/install?source=signature&lang=en&referral=saj3saj@gmail.com&idSignature=22>
>
> On Wed, Aug 23, 2017 at 2:59 PM, Suzen, Mehmet <suzen@acm.org> wrote:
>
>> It depends on what model you would like to train but models requiring
>> optimisation could use SGD with mini batches. See:
>> https://spark.apache.org/docs/latest/mllib-optimization.html
>> #stochastic-gradient-descent-sgd
>>
>> On 23 August 2017 at 14:27, Sea aj <saj3saj@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am trying to feed a huge dataframe to a ml algorithm in Spark but it
>>> crashes due to the shortage of memory.
>>>
>>> Is there a way to train the model on a subset of the data in multiple
>>> steps?
>>>
>>> Thanks
>>>
>>>
>>>
>>> <https://mailtrack.io/> Sent with Mailtrack
>>> <https://mailtrack.io/install?source=signature&lang=en&referral=saj3saj@gmail.com&idSignature=22>
>>>
>>
>>
>>
>> --
>>
>> Mehmet Süzen, MSc, PhD
>> <suzen@acm.org>
>>
>> | PRIVILEGED AND CONFIDENTIAL COMMUNICATION This e-mail transmission, and
>> any documents, files or previous e-mail messages attached to it, may
>> contain confidential information that is legally privileged. If you are not
>> the intended recipient or a person responsible for delivering it to the
>> intended recipient, you are hereby notified that any disclosure, copying,
>> distribution or use of any of the information contained in or attached to
>> this transmission is STRICTLY PROHIBITED within the applicable law. If you
>> have received this transmission in error, please: (1) immediately notify me
>> by reply e-mail to suzen@acm.org,  and (2) destroy the original
>> transmission and its attachments without reading or saving in any manner. |
>>
>
>


-- 

Mehmet Süzen, MSc, PhD
<suzen@acm.org>

| PRIVILEGED AND CONFIDENTIAL COMMUNICATION This e-mail transmission, and
any documents, files or previous e-mail messages attached to it, may
contain confidential information that is legally privileged. If you are not
the intended recipient or a person responsible for delivering it to the
intended recipient, you are hereby notified that any disclosure, copying,
distribution or use of any of the information contained in or attached to
this transmission is STRICTLY PROHIBITED within the applicable law. If you
have received this transmission in error, please: (1) immediately notify me
by reply e-mail to suzen@acm.org,  and (2) destroy the original
transmission and its attachments without reading or saving in any manner. |

Mime
View raw message