spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: scala RDD[MyCaseClass] to Dataset[MyCaseClass] perfomance
Date Mon, 13 Jul 2020 23:12:48 GMT
Wouldn't toDS() do this without conversion?

On Mon, Jul 13, 2020 at 5:25 PM Ivan Petrov <capacytron@gmail.com> wrote:
>
> Hi!
> I'm trying to understand the cost of RDD to Dataset conversion
> It takes me 60 minutes to create RDD [MyCaseClass] with 500.000.000.000 records
> It takes around 15 minutes to convert them to Dataset[MyCaseClass]
> The shema of MyCaseClass is
> str01: String,
> str02: String,
> str03: String,
> str04: String,
> long01: Long,
> long02: Long,
> double01: Double,
> map: Map[String, Double]
>
> What can i do in order to run it faster?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message