spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Tanase <atan...@adobe.com>
Subject Re: Pivot Data in Spark and Scala
Date Fri, 30 Oct 2015 10:50:06 GMT
Its actually a bit tougher as you’ll first need all the years. Also not sure how you would
reprsent your “columns” given they are dynamic based on the input data.

Depending on your downstream processing, I’d probably try to emulate it with a hash map
with years as keys instead of the columns.

There is probably a nicer solution using the data frames API but I’m not familiar with it.

If you actually need vectors I think this article I saw recently on the data bricks blog will
highlight some options (look for gather encoder)
https://databricks.com/blog/2015/10/20/audience-modeling-with-spark-ml-pipelines.html

-adrian

From: Deng Ching-Mallete
Date: Friday, October 30, 2015 at 4:35 AM
To: Ascot Moss
Cc: User
Subject: Re: Pivot Data in Spark and Scala

Hi,

You could transform it into a pair RDD then use the combineByKey function.

HTH,
Deng

On Thu, Oct 29, 2015 at 7:29 PM, Ascot Moss <ascot.moss@gmail.com<mailto:ascot.moss@gmail.com>>
wrote:
Hi,

I have data as follows:

A, 2015, 4
A, 2014, 12
A, 2013, 1
B, 2015, 24
B, 2013 4


I need to convert the data to a new format:
A ,    4,    12,    1
B,   24,        ,    4

Any idea how to make it in Spark Scala?

Thanks


Mime
View raw message