spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 颜发才(Yan Facai) <yaf...@gmail.com>
Subject How to iterate the element of an array in DataFrame?
Date Thu, 20 Oct 2016 08:34:03 GMT
Hi, I want to extract the attribute `weight` of an array, and combine them
to construct a sparse vector.

### My data is like this:

scala> mblog_tags.printSchema
root
 |-- category.firstCategory: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- category: string (nullable = true)
 |    |    |-- weight: string (nullable = true)


scala> mblog_tags.show(false)
+--------------------------------------------------------------+
|category.firstCategory                                        |
+--------------------------------------------------------------+
|[[tagCategory_060, 0.8], [tagCategory_029, 0.7]]|
|[[tagCategory_029, 0.9]]                                      |
|[[tagCategory_029, 0.8]]                                      |
+--------------------------------------------------------------+


### And expected:
Vectors.sparse(100, Array(60, 29),  Array(0.8, 0.7))
Vectors.sparse(100, Array(29),  Array(0.9))
Vectors.sparse(100, Array(29),  Array(0.8))

How to iterate an array in DataFrame?
Thanks.

Mime
View raw message