spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From V0lleyBallJunki3 <venkatda...@gmail.com>
Subject Does explode lead to more usage of memory
Date Sun, 19 Jan 2020 00:50:13 GMT
I am using a dataframe and has structure like this :

root
 |-- orders: array (nullable = true)
 |    |-- element: struct (containsNull = true) 
 |    |    |-- amount: double (nullable = true)
 |    |    |-- id: string (nullable = true)
 |-- user: string (nullable = true)
 |-- language: string (nullable = true)

Each user has multiple orders. Now if I explode orders like this:

df.select($"user", explode($"orders").as("order")) . Each order element will
become a row with a duplicated user and language. Was wondering if spark
actually converts each order element into a single row in memory or it just
logical. Because if a single user has 1000 orders  then wouldn't it lead to
a lot more memory consumption since it is duplicating user and language a
1000 times (once for each order) in memory?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message