spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Apply ML to grouped dataframe
Date Tue, 23 Aug 2016 09:12:55 GMT
I would suggest you to construct a toy problem and post for solution. At
this moment it's a little unclear what your intentions are.

Generally speaking, group by on a data frame created another data frame,
not multiple ones.
On 23 Aug 2016 16:35, "Wen Pei Yu" <yuwenp@cn.ibm.com> wrote:

> Hi Mirmal
>
> Filter works fine if I want handle one of grouped dataframe. But I has
> multiple grouped dataframe, I wish I can apply ML algorithm to all of them
> in one job, but not in for loops.
>
> Wenpei.
>
> [image: Inactive hide details for Nirmal Fernando ---08/23/2016 01:55:46
> PM---On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <yuwenp@cn.i]Nirmal
> Fernando ---08/23/2016 01:55:46 PM---On Tue, Aug 23, 2016 at 10:56 AM, Wen
> Pei Yu <yuwenp@cn.ibm.com> wrote: > We can group a dataframe b
>
> From: Nirmal Fernando <nirmal@wso2.com>
> To: Wen Pei Yu/China/IBM@IBMCN
> Cc: User <user@spark.apache.org>
> Date: 08/23/2016 01:55 PM
> Subject: Re: Apply ML to grouped dataframe
> ------------------------------
>
>
>
>
>
> On Tue, Aug 23, 2016 at 10:56 AM, Wen Pei Yu <*yuwenp@cn.ibm.com*
> <yuwenp@cn.ibm.com>> wrote:
>
>    We can group a dataframe by one column like
>
>    df.groupBy(df.col("gender"))
>
>
>
> On top of this DF, use a filter that would enable you to extract the
> grouped DF as separated DFs. Then you can apply ML on top of each DF.
>
> eg: xyzDF.filter(col("x").equalTo(x))
>
>
>    It like split a dataframe to multiple dataframe. Currently, we can
>    only apply simple sql function to this GroupedData like agg, max etc.
>
>    What we want is apply one ML algorithm to each group.
>
>    Regards.
>
>    [image: Inactive hide details for Nirmal Fernando ---08/23/2016
>    01:14:48 PM---Hi Wen, AFAIK Spark MLlib implements its machine learning]Nirmal
>    Fernando ---08/23/2016 01:14:48 PM---Hi Wen, AFAIK Spark MLlib implements
>    its machine learning algorithms on top of
>
>    From: Nirmal Fernando <*nirmal@wso2.com* <nirmal@wso2.com>>
>    To: Wen Pei Yu/China/IBM@IBMCN
>    Cc: User <*user@spark.apache.org* <user@spark.apache.org>>
>    Date: 08/23/2016 01:14 PM
>
>
>    Subject: Re: Apply ML to grouped dataframe
>    ------------------------------
>
>
>
>    Hi Wen,
>
>    AFAIK Spark MLlib implements its machine learning algorithms on top of
>    Spark dataframe API. What did you mean by a grouped dataframe?
>
>    On Tue, Aug 23, 2016 at 10:42 AM, Wen Pei Yu <*yuwenp@cn.ibm.com*
>    <yuwenp@cn.ibm.com>> wrote:
>       Hi Nirmal
>
>          I didn't get your point.
>          Can you tell me more about how to use MLlib to grouped dataframe?
>
>          Regards.
>          Wenpei.
>
>          [image: Inactive hide details for Nirmal Fernando ---08/23/2016
>          10:26:36 AM---You can use Spark MLlib http://spark.apache.org/docs/late]Nirmal
>          Fernando ---08/23/2016 10:26:36 AM---You can use Spark MLlib
>          *http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas*
>          <http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-bas>
>
>          From: Nirmal Fernando <*nirmal@wso2.com* <nirmal@wso2.com>>
>          To: Wen Pei Yu/China/IBM@IBMCN
>          Cc: User <*user@spark.apache.org* <user@spark.apache.org>>
>          Date: 08/23/2016 10:26 AM
>          Subject: Re: Apply ML to grouped dataframe
>          ------------------------------
>
>
>
>
>          You can use Spark MLlib
>          *http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api*
>          <http://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api>
>
>          On Tue, Aug 23, 2016 at 7:34 AM, Wen Pei Yu <*yuwenp@cn.ibm.com*
>          <yuwenp@cn.ibm.com>> wrote:
>             Hi
>
>                      We have a dataframe, then want group it and apply a
>                      ML algorithm or statistics(say t test) to each one. Is there any
efficient
>                      way for this situation?
>
>                      Currently, we transfer to pyspark, use groupbykey
>                      and apply numpy function to array. But this wasn't an efficient
way, right?
>
>                      Regards.
>                      Wenpei.
>
>
>
>
>          --
>
>          Thanks & regards,
>          Nirmal
>
>          Team Lead - WSO2 Machine Learner
>          Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>          Mobile: *+94715779733* <%2B94715779733>
>          Blog: *http://nirmalfdo.blogspot.com/*
>          <http://nirmalfdo.blogspot.com/>
>
>
>
>
>
>    --
>
>    Thanks & regards,
>    Nirmal
>
>    Team Lead - WSO2 Machine Learner
>    Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>    Mobile: *+94715779733* <%2B94715779733>
>    Blog: *http://nirmalfdo.blogspot.com/* <http://nirmalfdo.blogspot.com/>
>
>
>
>
>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Team Lead - WSO2 Machine Learner
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: *http://nirmalfdo.blogspot.com/* <http://nirmalfdo.blogspot.com/>
>
>
>
>
>

Mime
View raw message