spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 颜发才(Yan Facai) <yaf...@gmail.com>
Subject Re: How to iterate the element of an array in DataFrame?
Date Fri, 21 Oct 2016 07:35:37 GMT
I don't know how to construct
`array<struct<category:string,weight:string>>`.
Could anyone help me?

I try to get the array by :
scala> mblog_tags.map(_.getSeq[(String, String)](0))

while the result is:
res40: org.apache.spark.sql.Dataset[Seq[(String, String)]] = [value:
array<struct<_1:string,_2:string>>]


How to express `struct<string, string>` ?



On Thu, Oct 20, 2016 at 4:34 PM, 颜发才(Yan Facai) <yafc18@gmail.com> wrote:

> Hi, I want to extract the attribute `weight` of an array, and combine them
> to construct a sparse vector.
>
> ### My data is like this:
>
> scala> mblog_tags.printSchema
> root
>  |-- category.firstCategory: array (nullable = true)
>  |    |-- element: struct (containsNull = true)
>  |    |    |-- category: string (nullable = true)
>  |    |    |-- weight: string (nullable = true)
>
>
> scala> mblog_tags.show(false)
> +--------------------------------------------------------------+
> |category.firstCategory                                        |
> +--------------------------------------------------------------+
> |[[tagCategory_060, 0.8], [tagCategory_029, 0.7]]|
> |[[tagCategory_029, 0.9]]                                      |
> |[[tagCategory_029, 0.8]]                                      |
> +--------------------------------------------------------------+
>
>
> ### And expected:
> Vectors.sparse(100, Array(60, 29),  Array(0.8, 0.7))
> Vectors.sparse(100, Array(29),  Array(0.9))
> Vectors.sparse(100, Array(29),  Array(0.8))
>
> How to iterate an array in DataFrame?
> Thanks.
>
>
>
>

Mime
View raw message