spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengupta.develo...@gmail.com>
Subject Re: Reading Large File in Pyspark
Date Thu, 03 Jun 2021 10:12:57 GMT
Hi,

could not agree more with Molotch :)


Regards,
Gourav Sengupta

On Thu, May 27, 2021 at 7:08 PM Molotch <magnn@kth.se> wrote:

> You can specify the line separator to make spark split your records into
> separate rows.
>
> df = spark.read.option("lineSep","^^^").text("path")
>
> Then you need to df.select(split("value", "***").as("arrayColumn")) the
> column into an array and map over it with getItem to create a column for
> each property.
>
> df.select((0 until 8).map(i => $"arrayColumn".getItem(i).as(s"col$i")): _*
> )
>
> Then you should have a DataFrame with each record on a row and each
> property
> in a column.
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message