spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JG Perrin <jper...@lumeris.com>
Subject RE: Joining 2 dataframes, getting result as nested list/structure in dataframe
Date Thu, 24 Aug 2017 13:19:07 GMT
Thanks Michael – this is a great article… very helpful

From: Michael Armbrust [mailto:michael@databricks.com]
Sent: Wednesday, August 23, 2017 4:33 PM
To: JG Perrin <jperrin@lumeris.com>
Cc: user@spark.apache.org
Subject: Re: Joining 2 dataframes, getting result as nested list/structure in dataframe

You can create a nested struct that contains multiple columns using struct().

Here's a pretty complete guide on working with nested data: https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html

On Wed, Aug 23, 2017 at 2:30 PM, JG Perrin <jperrin@lumeris.com<mailto:jperrin@lumeris.com>>
wrote:
Hi folks,

I am trying to join 2 dataframes, but I would like to have the result as a list of rows of
the right dataframe (dDf in the example) in a column of the left dataframe (cDf in the example).
I made it work with one column, but having issues adding more columns/creating a row(?).
    Seq<String> joinColumns = new Set2<>("c1", "c2").toSeq();
    Dataset<Row> allDf = cDf.join(dDf, joinColumns, "inner");
    allDf.printSchema();
    allDf.show();

    Dataset<Row> aggDf = allDf.groupBy(cDf.col("c1"), cDf.col("c2"))
            .agg(collect_list(col("c50")));
    aggDf.show();

Output:
+--------+-------+---------------------------+
|c1      |c2     |collect_list(c50)          |
+--------+-------+---------------------------+
|    3744|1160242|         [6, 5, 4, 3, 2, 1]|
|    3739|1150097|                        [1]|
|    3780|1159902|            [5, 4, 3, 2, 1]|
|     132|1200743|               [4, 3, 2, 1]|
|    3778|1183204|                        [1]|
|    3766|1132709|                        [1]|
|    3835|1146169|                        [1]|
+--------+-------+---------------------------+

Thanks,

jg

________________________________

This electronic transmission and any documents accompanying this electronic transmission contain
confidential information belonging to the sender. This information may contain confidential
health information that is legally privileged. The information is intended only for the use
of the individual or entity named above. The authorized recipient of this transmission is
prohibited from disclosing this information to any other party unless required to do so by
law or regulation and is required to delete or destroy the information after its stated need
has been fulfilled. If you are not the intended recipient, you are hereby notified that any
disclosure, copying, distribution or the taking of any action in reliance on or regarding
the contents of this electronically transmitted information is strictly prohibited. If you
have received this E-mail in error, please notify the sender and delete this message immediately.

Mime
View raw message