spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahesh Sawaiker <mahesh_sawai...@persistent.com>
Subject RE: Equivalent of Redshift ListAgg function in Spark (Pyspak)
Date Mon, 09 Oct 2017 06:14:36 GMT
After doing group, you can use mkstring on the data frame. Following is an example where are
columns are concatenated with space as a separator.



scala> call_cdf.map(row => row.mkString(" ")).show(false)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                   
                                                                                         
                                                                                         
                                                                                         
              |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1 AAAAAAAABAAAAAAA 1998-01-01 null null 2450997 NY Metro large 2325 1374075 8AM-12AM Keith
Cunningham 4 Matters may hear as; profita New, cold plants can put al Dante Cook 3 pri 4 ese
995 Park 3rd Dr. Suite 470 Five Points Ziebach County SD 56098 United States -6.0 0.02   
                                                                                         
         |
|2 AAAAAAAACAAAAAAA 1998-01-01 2000-12-31 null 2450876 Mid Atlantic large 4208 837392 8AM-4PM
Stephen Clem 3 Classes devote largely other, standard ter Free germans prove flatly industrial
drugs. Low questions come to a equations. British, conservative Christopher Perez 6 cally
3 pri 245 Johnson  Circle Suite 200 Fairview Williamson County TN 35709 United States -5.0
0.03|
|3 AAAAAAAACAAAAAAA 2001-01-01 null null 2450876 Mid Atlantic small 3251 837392 8AM-4PM William
Johnson 3 Classes devote largely other, standard ter Ridiculous requirements must not implement
about pure values. Substances know powers. Political rel Derrick Burke 6 cally 3 pri 245 Johnson
 Circle Suite 200 Fairview Williamson County TN 35709 United States -5.0 0.03        |
|4 AAAAAAAAEAAAAAAA 1998-01-01 2000-01-01 null 2450872 North Midwest large 2596 708708 8AM-4PM
Lamont Greene 3 Events must find anyway  Great rates must ensure famous, other banks. As main
goals get home as a  Marvin Dean 2 able 2 able 927 Oak Main ST Suite 150 Five Points Williamson
County TN 36098 United States -5.0 0.03                                                  |
|5 AAAAAAAAEAAAAAAA 2000-01-02 2001-12-31 null 2450872 North Midwest medium 2596 708708 8AM-12AM
Lamont Greene 3 Events must find anyway  So fresh supplies keep meanwhile religious, labour
years. Rapid, careful subject Matthew Williams 2 able 1 able 927 Oak Main ST Suite 150 Five
Points Williamson County TN 36098 United States -5.0 0.0                                 
   |
|6 AAAAAAAAEAAAAAAA 2002-01-01 null null 2450872 North Midwest small 2596 708708 8AM-4PM Emilio
Romano 6 As well novel sentences check through the plans. Sophisticated cities fall for e
William Johnson 5 anti 1 able 927 Oak Main ST Suite 150 Five Points Williamson County TN 36098
United States -5.0 0.07                                                                  
   |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


+-----------------+-----------------+-----------------+---------------+-----------------+---------------+-------------+--------+------------+--------+--------+----------------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-----------+-----------------+--------+------+-------------+-------------+-----------------+
|cc_call_center_sk|cc_call_center_id|cc_rec_start_date|cc_rec_end_date|cc_closed_date_sk|cc_open_date_sk|
     cc_name|cc_class|cc_employees|cc_sq_ft|cc_hours|      cc_manager|cc_mkt_id|        cc_mkt_class|
        cc_mkt_desc|cc_market_manager|cc_division|cc_division_name|cc_company|cc_company_name|cc_street_number|cc_street_name|cc_street_type|cc_suite_number|
   cc_city|        cc_county|cc_state|cc_zip|   cc_country|cc_gmt_offset|cc_tax_percentage|
+-----------------+-----------------+-----------------+---------------+-----------------+---------------+-------------+--------+------------+--------+--------+----------------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-----------+-----------------+--------+------+-------------+-------------+-----------------+
|                1| AAAAAAAABAAAAAAA|       1998-01-01|           null|             null|
       2450997|     NY Metro|   large|        2325| 1374075|8AM-12AM|Keith Cunningham|   
    4|Matters may hear ...|New, cold plants ...|       Dante Cook|          3|           
 pri|         4|            ese|             995|      Park 3rd|           Dr.|      Suite
470|Five Points|   Ziebach County|      SD| 56098|United States|         -6.0|           
 0.02|
|                2| AAAAAAAACAAAAAAA|       1998-01-01|     2000-12-31|             null|
       2450876| Mid Atlantic|   large|        4208|  837392| 8AM-4PM|    Stephen Clem|   
    3|Classes devote la...|Free germans prov...|Christopher Perez|          6|           cally|
        3|            pri|             245|      Johnson |        Circle|      Suite 200|
  Fairview|Williamson County|      TN| 35709|United States|         -5.0|             0.03|
|                3| AAAAAAAACAAAAAAA|       2001-01-01|           null|             null|
       2450876| Mid Atlantic|   small|        3251|  837392| 8AM-4PM| William Johnson|   
    3|Classes devote la...|Ridiculous requir...|    Derrick Burke|          6|           cally|
        3|            pri|             245|      Johnson |        Circle|      Suite 200|
  Fairview|Williamson County|      TN| 35709|United States|         -5.0|             0.03|
|                4| AAAAAAAAEAAAAAAA|       1998-01-01|     2000-01-01|             null|
       2450872|North Midwest|   large|        2596|  708708| 8AM-4PM|   Lamont Greene|   
    3|Events must find ...|Great rates must ...|      Marvin Dean|          2|           
able|         2|           able|             927|      Oak Main|            ST|      Suite
150|Five Points|Williamson County|      TN| 36098|United States|         -5.0|           
 0.03|
|                5| AAAAAAAAEAAAAAAA|       2000-01-02|     2001-12-31|             null|
       2450872|North Midwest|  medium|        2596|  708708|8AM-12AM|   Lamont Greene|   
    3|Events must find ...|So fresh supplies...| Matthew Williams|          2|           
able|         1|           able|             927|      Oak Main|            ST|      Suite
150|Five Points|Williamson County|      TN| 36098|United States|         -5.0|           
  0.0|
|                6| AAAAAAAAEAAAAAAA|       2002-01-01|           null|             null|
       2450872|North Midwest|   small|        2596|  708708| 8AM-4PM|   Emilio Romano|   
    6|As well novel sen...|Sophisticated cit...|  William Johnson|          5|           
anti|         1|           able|             927|      Oak Main|            ST|      Suite
150|Five Points|Williamson County|      TN| 36098|United States|         -5.0|           
 0.07|
+-----------------+-----------------+-----------------+---------------+-----------------+---------------+-------------+--------+------------+--------+--------+----------------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-----------+-----------------+--------+------+-------------+-------------+-----------------+

From: Somasundaram Sekar [mailto:somasundar.sekar@tigeranalytics.com]
Sent: Sunday, October 08, 2017 5:30 PM
To: user@spark.apache.org
Subject: Equivalent of Redshift ListAgg function in Spark (Pyspak)

Hi,

I want to concat multiple columns into a single column after grouping the  DataFrame,

I want an functional equivalent of Redshift ListAgg function

pg_catalog.Listagg(column, '|')
         within GROUP( ORDER BY column) AS
name

LISTAGG Function
: For each group in a query, the LISTAGG aggregate function orders the rows for that group
according to the ORDER BY expression, then concatenates the values into a single string.
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent
Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed.
If you are not the intended recipient, you are not authorized to read, retain, copy, print,
distribute or use this message. If you have received this communication in error, please notify
the sender and delete all copies of this message. Persistent Systems Ltd. does not accept
any liability for virus infected mails.
Mime
View raw message