spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Column explode a map
Date Thu, 24 Mar 2016 20:20:48 GMT
If you know the map keys ahead of time then you can just extract them
directly.

Here are a few examples
<https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/1826874816150656/2840265927289860/latest.html>
.

On Thu, Mar 24, 2016 at 12:01 PM, Michał Zieliński <
zielinski.michal0@gmail.com> wrote:

> Hi,
>
> Imagine you have a structure like this:
>
> val events = sqlContext.createDataFrame(
>    Seq(
>      ("a", Map("a"->1,"b"->1)),
>      ("b", Map("b"->1,"c"->1)),
>      ("c", Map("a"->1,"c"->1))
>    )
>  ).toDF("id","map")
>
> What I want to achieve is have the map values as a separate columns.
> Basically I want to achieve this:
>
> +---+----+----+----+
> | id|   a|   b|   c|
> +---+----+----+----+
> |  a|   1|   1|null|
> |  b|null|   1|   1|
> |  c|   1|null|   1|
> +---+----+----+----+
>
> I managed to create it with an explode-pivot combo, but for large dataset,
> and a list of map keys around 1000 I imagine this will
> be prohibitively expensive. I reckon there must be a much easier way to
> achieve that, than:
>
> val exploded =
> events.select(col("id"),explode(col("map"))).groupBy("id").pivot("key").sum("value")
>
> Any help would be appreciated. :)
>

Mime
View raw message