spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mathewwicks <mathew.wi...@gmail.com>
Subject Re: Do we support excluding the current row in PARTITION BY windowing functions?
Date Mon, 03 Apr 2017 09:17:03 GMT
I am not sure why, but the mailing list is saying. "This post has NOT been
accepted by the mailing list yet".

On Mon, 3 Apr 2017 at 20:52 mathewwicks [via Apache Spark User List] <
ml-node+s1001560n28558h32@n3.nabble.com> wrote:

> Here is an example to illustrate my point.
>
> In this toy example, we are collecting a list of the other products that
> each user has bought, and appending it as a new column. (Also note, that we
> are filtering on some arbitrary column 'good_bad'.)
>
> I would like to know if we support NOT including the CURRENT ROW in the
> PARTITION BY.
> (E.g. transaction 1 would have `other_purchases = [prod2, prod3]` rather
> than `other_purchases = [prod1, prod2, prod3]`)
>
> ------------------- Code Below -------------------
>
> df = spark.createDataFrame([
>     (1, "user1", "prod1", "good"),
>     (2, "user1", "prod2", "good"),
>     (3, "user1", "prod3", "good"),
>     (4, "user2", "prod3", "bad"),
>     (5, "user2", "prod4", "good"),
>     (5, "user2", "prod5", "good")],
>     ("trans_id", "user_id", "prod_id", "good_bad")
> )
> df.show()
>
> df = df.selectExpr(
>     "trans_id",
>     "user_id",
>     "COLLECT_LIST(CASE WHEN good_bad == 'good' THEN prod_id END)
> OVER(PARTITION BY user_id) AS other_purchases"
> )
> df.show()
> ----------------------------------------------------
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Do-we-support-excluding-the-current-row-in-PARTITION-BY-windowing-functions-tp28558.html
> This email was sent by mathewwicks
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=8051>
> (via Nabble)
> To receive all replies by email, subscribe to this discussion
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=subscribe_by_code&node=28558&code=bWF0aGV3LndpY2tzQGdtYWlsLmNvbXwyODU1OHwtODk0MjA2NjY=>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Do-we-support-excluding-the-current-row-in-PARTITION-BY-windowing-functions-tp28558p28559.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Mime
View raw message